Daniel Golovin
Daniel Golovin currently leads the Google DeepMind group located in Pittsburgh. He works broadly on machine learning driven optimization and automated experimental design -- looking at how we can leverage ML to automatically design superior systems and products across the board.
He founded Vizier which is widely used across Google, including what might be the largest AI-driven culinary optimization in history - the ML Cookie experiment.
He founded Vizier which is widely used across Google, including what might be the largest AI-driven culinary optimization in history - the ML Cookie experiment.
Authored Publications
Sort By
SmartChoices: Augmenting Software with Learned Implementations
Eric Yawei Chen
Νikhil Sarda
arXiv (2023)
Preview abstract
We are living in a golden age of machine learning. Powerful models are being trained to perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying those models in existing software systems remains difficult. In this paper we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We explain the overall design philosophy and present case studies using SmartChoices within large scale industrial systems.
View details
Preview abstract
Single-objective black box optimization (also known as zeroth-order
optimization) is the process of minimizing a scalar objective $f(x)$, given evaluations at adaptively chosen inputs $x$. In this paper, we
consider multi-objective optimization, where $f(x)$ outputs a vector of
possibly competing objectives and the goal is to converge to the Pareto frontier. Quantitatively, we wish to maximize the standard \emph{hypervolume indicator} metric, which measures the dominated hypervolume of the entire set of chosen inputs. In this paper, we introduce a novel scalarization function, which we term the \emph{hypervolume scalarization}, and show that drawing random scalarizations from an appropriately chosen distribution can be used to efficiently approximate the \emph{hypervolume indicator} metric. We utilize this connection to show that Bayesian optimization with our scalarization via common acquisition functions, such as Thompson Sampling or Upper Confidence Bound, provably converges to the whole Pareto frontier by deriving tight \emph{hypervolume regret} bounds on the order of $\widetilde{O}(\sqrt{T})$. Furthermore, we highlight the general utility of our scalarization framework by showing that any provably convergent single-objective optimization process can be converted to a multi-objective optimization process with provable convergence guarantees.
View details
Gradientless Descent: High-Dimensional Zeroth-Order Optimization
ICLR 2020 (2020)
Preview abstract
Zeroth-order optimization is the process of minimizing an objective $f(x)$ given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and we derive two invariance properties of our algorithm: monotone and affine invariance. Specifically, for {\it any monotone transform} of a smooth and strongly convex objective with latent dimension $k$, then we present a novel analysis that shows convergence within an $\epsilon$-ball of the optimum in $O(kQ\log(n)\log(R/\epsilon))$ evaluations, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. From our geometric perspective, we can further show that our analysis is optimal. We emphasize that monotone and affine invariance are key to the empirical success of gradientless algorithms, as demonstrated on BBOB and MuJoCo benchmarks.
View details
Bayesian Optimization for a Better Dessert
Subhodeep Moitra
D. Sculley
Proceedings of the 2017 NIPS Workshop on Bayesian Optimization, December 9, 2017, Long Beach, USA (to appear)
Preview abstract
We present a case study on applying Bayesian Optimization to a complex real-world system; our challenge was to optimize chocolate chip cookies. The process was a mixed-initiative system where both human chefs, human raters, and a machine optimizer participated in 144 experiments. This process resulted in highly rated cookies that deviated from expectations in some surprising ways -- much less sugar in California, and cayenne in Pittsburgh. Our experience highlights the importance of incorporating domain expertise and the value of transfer learning approaches.
View details
Black Box Optimization via a Bayesian-Optimized Genetic Algorithm
Advances in Neural Information Processing Systems 30 (NIPS 2017) (to appear)
Preview abstract
We present a simple and robust optimization algorithm related to genetic algorithms, and with analogies to the popular CMA-ES search algorithm, that serves as a cheap alternative to Bayesian Optimization. The algorithm is robust against both monotonic transforms of the objective function value and affine transformations of the feasible region. It is fast and easy to implement, and has performance comparable to CMA-ES on a suite of benchmarks while spending less CPU in the optimization algorithm, and can exhibit better overall performance than Bayesian Optimization when the objective function is cheap.
View details
Google Vizier: A Service for Black-Box Optimization
Subhodeep Moitra
D. Sculley
ACM (2017), pp. 1487-1495
Preview abstract
Any sufficiently complex system acts as a black box when it becomes easier to
experiment with than to understand. Hence, black-box optimization has become
increasingly important as systems have become more complex. In this paper we
describe Google Vizier, a Google-internal service for performing
black-box optimization that has become the de facto parameter tuning
engine at Google. Google Vizier is used to optimize many of our machine
learning models and other systems, and also provides core capabilities to
Google's Cloud Machine Learning HyperTune subsystem. We discuss our
requirements, infrastructure design, underlying algorithms, and advanced
features such as transfer learning and automated early stopping that the
service provides.
View details
Machine Learning: The High Interest Credit Card of Technical Debt
D. Sculley
Eugene Davydov
Dietmar Ebner
Vinay Chaudhary
Michael Young
SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)
Preview abstract
Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored
where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
View details
Ad Click Prediction: a View from the Trenches
D. Sculley
Michael Young
Dietmar Ebner
Julian Grady
Lan Nie
Eugene Davydov
Sharat Chikkerur
Dan Liu
Arnar Mar Hrafnkelsson
Tom Boulos
Jeremy Kubica
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2013)
Preview abstract
Predicting ad click--through rates (CTR) is a massive-scale learning
problem that is central to the multi-billion dollar online
advertising industry. We present a selection of case studies and
topics drawn from recent experiments in the setting of a deployed
CTR prediction system. These include improvements in the context of
traditional supervised learning based on an FTRL-Proximal online
learning algorithm (which has excellent sparsity and convergence
properties) and the use of per-coordinate learning rates.
We also explore some of the challenges that arise in a real-world
system that may appear at first to be outside the domain of
traditional machine learning research. These include useful tricks
for memory savings, methods for assessing and visualizing
performance, practical methods for providing confidence estimates
for predicted probabilities, calibration methods, and methods for
automated management of features. Finally, we also detail several
directions that did not turn out to be beneficial for us, despite
promising results elsewhere in the literature. The goal of this
paper is to highlight the close relationship between theoretical
advances and practical engineering in this industrial setting, and
to show the depth of challenges that appear when applying
traditional machine learning methods in a complex dynamic system.
View details
Large-Scale Learning with Less RAM via Randomization
D. Sculley
Michael Young
Proceedings of the 30 International Conference on Machine Learning (ICML) (2013), pp. 10
Preview abstract
We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32-bit float encodings, this
reduces RAM usage by more than 50% during training and by up to 95% when making predictions from a fixed model, with almost no loss in accuracy. We also show that randomized counting can be used to implement per-coordinate learning rates, improving model quality with little additional RAM. We prove these memory-saving methods achieve regret guarantees similar to their exact variants. Empirical evaluation confirms excellent performance, dominating standard approaches across memory versus accuracy tradeoffs.
View details