Large-scale optimization

Background

We apply techniques from areas such as combinatorial optimization, online algorithms, and control theory to make Google’s big computational infrastructure do more with less. We combine online and offline optimizations to achieve such goals as increasing throughput, decreasing latency, minimizing resource contention, maximizing the efficacy of caches, and eliminating unnecessary work in distributed systems. Our research is used in critical infrastructure that supports products such as Search and Cloud.

Some of our projects

Consistent Hashing

We design memoryless balanced allocation algorithms to assign a dynamic set of tasks to a dynamic set of servers such that the load (the number of assigned tasks) on each server is bounded, and the allocation does not change by much for every update operation.

Distributed Optimization Based on Core-Sets

Composable core-sets provide an effective method for solving optimization problems on massive datasets. The main idea is to partition data among some number of machines, then use each machine to compute some small summary/sketch of the data. After gathering all summaries on one machine, we solve the original optimization problem.

Large-Scale Set Cover

Maximum coverage and minimum set-cover problems are among the fundamental problems in combinatorial optimization and have a variety of applications in machine learning and data mining. We study these problems in the streaming and MapReduce models of computation, and develop practical algorithms with tight theoretical bounds for runtime, space and approximation guarantee. Our main idea is a novel sketching technique that compresses the input into a small space, independent of the size of the ground set, without hurting the quality by much.

Search Infrastructure Optimization

Several of our projects have partnered with product teams to help improve the efficiency of Google's search infrastructure. In the backend for web search, we introduced a distributed feedback control loop to govern the way queries are fanned out to worker machines, in order to balance load better by taking advantage of real-time load information on the workers. We also improved the efficacy of caching by increasing the homogeneity of the stream of queries seen by any single worker machine. To accomplish this, we clustered query terms, used these clusters to define voting weights, and assigned queries to workers on the basis of votes cast by the terms in the query. In a pub-sub system that powers some of our display ads, we saved computation by clustering some components of subscriptions in a way that lets us take advantage of bitwise-parallel CPU operations.

Featured publications

Scalable Feature Selection via Distributed Diversity Maximization

Sepehr Abbasi Zadeh

Mehrdad Ghadiri

Vahab S. Mirrokni

Morteza Zadimoghaddam

AAAI (2017), pp. 2876-2883

Distributed Coverage Maximization via Sketching

MohammadHossein Bateni

Hossein Esfandiari

Vahab Mirrokni

arXiv (2016) (to appear)

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

Aditya Bhaskara

Afshin Rostamizadeh

Jason Altschuler

Morteza Zadimoghaddam

Thomas Fu

Vahab Mirrokni

ICML (2016) (to appear)

Almost Optimal Streaming Algorithms for Coverage Problems

Mohammadhossein Bateni

Hossein Esfandiari

Vahab Mirrokni

29th ACM Symposium on Parallelism in Algorithms and Architectures (2017)

Composable core-sets for diversity and coverage maximization

Piotr Indyk

Sepideh Mahabadi

Mohammad Mahdian

Vahab S. Mirrokni

PODS (2014), pp. 100-108

Composable core-sets for diversity and coverage maximization

Piotr Indyk

Sepideh Mahabadi

Mohammad Mahdian

Vahab S. Mirrokni

PODS (2014), pp. 100-108

Distributed Balanced Clustering via Mapping Coresets

Mohammadhossein Bateni

Aditya Bhaskara

Silvio Lattanzi

Vahab Mirrokni

NIPS, Neural Information Processing Systems Foundation (2014)

Randomized Composable Core-sets for Distributed Submodular Maximization

Vahab S. Mirrokni

Morteza Zadimoghaddam

STOC (2015), pp. 153-162

Submodular secretary problems with extensions

Mohammadhossein Bateni

MohammadTaghi Hajiaghayi

Morteza Zadimoghaddam

ACM Transactions on Algorithms, 9 (4) (2013)

Preview

Consistent Hashing with Bounded Loads

Vahab Mirrokni

Mikkel Thorup

Morteza Zadimoghaddam

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (2018), pp. 587-604

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Large-scale optimization

Background

Some of our projects

Consistent Hashing

Distributed Optimization Based on Core-Sets

Large-Scale Set Cover

Search Infrastructure Optimization

Featured publications

Join us