Firmament: Fast, Centralized Cluster Scheduling at Scale

Ionel Gog

Malte Schwarzkopf

Adam Gleave

Robert N. M. Watson

Steven Hand

12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), USENIX Association (2016), pp. 99-115 (to appear)

Download Google Scholar

Abstract

Centralized datacenter schedulers can make high-quality
placement decisions when scheduling tasks in a cluster.
Today, however, high-quality placements come at
the cost of high latency at scale, which degrades response
time for interactive tasks and reduces cluster utilization.
This paper describes Firmament, a centralized scheduler
that scales to over ten thousand machines at subsecond
placement latency even though it continuously
reschedules all tasks via a min-cost max-flow (MCMF)
optimization. Firmament achieves low latency by using
multiple MCMF algorithms, by solving the problem incrementally,
and via problem-specific optimizations.
Experiments with a Google workload trace from a
12,500-machine cluster show that Firmament improves
placement latency by 20× over Quincy [22], a prior
centralized scheduler using the same MCMF optimization.
Moreover, even though Firmament is centralized, it
matches the placement latency of distributed schedulers
for workloads of short tasks. Finally, Firmament exceeds
the placement quality of four widely-used centralized
and distributed schedulers on a real-world cluster,
and hence improves batch task response time by 6×

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Firmament: Fast, Centralized Cluster Scheduling at Scale

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Firmament: Fast, Centralized Cluster Scheduling at Scale

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities