Load is not what you should balance: Introducing Prequal

Bartek Wydrowski

Bobby Kleinberg

Steve Rumble

Aaron Archer

(2024)

Download Google Scholar

Abstract

We present Prequal (\emph{Probing to Reduce Queuing and Latency}), a load balancer
for distributed multi-tenant systems. Prequal aims to minimize
real-time request latency in the presence of heterogeneous server
capacities and non-uniform, time-varying antagonist load. It actively probes
server load to leverage the \emph{power of $d$ choices}
paradigm, extending it with asynchronous and reusable probes. Cutting
against received wisdom, Prequal does not balance CPU load, but instead
selects servers according to estimated latency and active requests-in-flight
(RIF). We explore its major design features on a testbed system
and evaluate it on YouTube, where it has been deployed for more than two years. Prequal has dramatically decreased tail latency, error rates, and resource use, enabling YouTube and
other production systems at Google to run at much higher utilization.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Load is not what you should balance: Introducing Prequal

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Load is not what you should balance: Introducing Prequal

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities