Preventing Network Bottlenecks: Accelerating Datacenter Services with Hotspot-Aware Placement for Compute and Storage

Hamid Bazzaz

Yingjie Bi

Weiwu Pang

Minlan Yu

Ramesh Govindan

Neal Cardwell

Nandita Dukkipati

Chloe Tsai

Chris DeForeest

Yuxue Jin

Charlie Carver

Jan Kopański

Liqun (Legion) Cheng

Amin Vahdat

2025

Download Google Scholar

Abstract

Datacenter network hotspots, defined as links with persistently high utilization, can lead to performance bottlenecks.In this work, we study hotspots in Google’s datacenter networks. We find that these hotspots occur most frequently at ToR switches and can persist for hours. They are caused mainly by bandwidth demand-supply imbalance, largely due to high demand from network-intensive services, or demand exceeding available bandwidth when compute/storage upgrades outpace ToR bandwidth upgrades. Compounding this issue is bandwidth-independent task/data placement by data-center compute and storage schedulers. We quantify the performance impact of hotspots, and find that they can degrade the end-to-end latency of some distributed applications by over 2× relative to low utilization levels. Finally, we describe simple improvements we deployed. In our cluster scheduler, adding hotspot-aware task placement reduced the number of hot ToRs by 90%; in our distributed file system, adding hotspot-aware data placement reduced p95 network latency by more than 50%. While congestion control, load balancing, and traffic engineering can efficiently utilize paths for a fixed placement, we find hotspot-aware placement – placing tasks and data under ToRs with higher available bandwidth – is crucial for achieving consistently good performance.

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Preventing Network Bottlenecks: Accelerating Datacenter Services with Hotspot-Aware Placement for Compute and Storage

Abstract

Meet the teams driving innovation