Junlan Zhou
Junlan Zhou is a Senior Staff Software Engineer / Tech Lead Manager at Google where she leads network-aware scheduling and deployability of our network products. Her current interests include evolution of our data center networks, joint network/compute/storage optimization, deployability of our network products, network efficiency and cost optimizations. She has a Ph.D from University of California Los Angeles and joined google in 2007. She has authored/co-authored 90+ patents issued.
Research Areas
Authored Publications
Sort By
Hashing Design in Modern Networks: Challenges and Mitigation Techniques
Keqiang He
Minlan Yu
Nick Duffield
Shidong Zhang
Yunhong Xu
2022
Preview abstract
Traffic load balancing across multiple paths is a critical task for modern networks to reduce network congestion and improve network efficiency.
Hashing which is the foundation of traffic load balancing still faces practical challenges.
The key problem is there is a growing need for more hash functions because networks are getting larger with more switches, more stages and increased path diversity.
Meanwhile topology and routing becomes more agile in order to efficiently serve traffic demands with stricter throughput and latency SLAs.
On the other hand, current generation switch chips only provide a limited number of uncorrelated hash functions.
We first demonstrate why the limited number of hashing functions is a practical challenge in today's datacenter network (DCN) and wide-area network (WAN) designs. Then, to mitigate the problem, we propose a novel approach named \textsl{color recombining} which enables hash functions reuse via leveraging topology traits of multi-stage DCN networks. We also describe a novel framework based on \textsl{\coprime} theory to mitigate hash correlation in generic mesh topologies (i.e., spineless DCN and WAN). Our evaluation on real network trace data and topologies demonstrate that we can reduce the extent of load imbalance (measured by coefficient of variation) by an order of magnitude.
View details
Minimal Rewiring: Efficient Live Expansion for Clos Data Center Networks
Shizhen Zhao
Joon Ong
Proc. 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), USENIX Association (to appear)
Preview abstract
Clos topologies have been widely adopted for large-scale data center networks (DCNs), but it has been difficult to support incremental expansions of Clos DCNs. Some prior work has assumed that it is impossible to design DCN topologies that are both well-structured (non-random) and incrementally expandable at arbitrary granularities.
We demonstrate that it is indeed possible to design such networks, and to expand them while they are carrying live traffic, without incurring packet loss. We use a layer of patch panels between blocks of switches in a Clos network, which makes physical rewiring feasible, and we describe how to use integer linear programming (ILP) to minimize the number of patch-panel connections that must be changed, which makes expansions faster and cheaper. We also describe a block-aggregation technique that makes our ILP approach scalable.
We tested our "minimal-rewiring" solver on two kinds of fine-grained expansions using 2250 synthetic DCN topologies, and found that the solver can handle 99% of these cases while changing under 25% of the connections. Compared to prior approaches, this solver (on average) reduces the number of "stages" per expansion by about 3.1X -- a significant improvement to our operational costs, and to our exposure (during expansions) to capacity-reducing faults.
View details
WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers
Malveeka Tewari
Min Zhu
Abdul Kabbani
EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems (2014), Article No. 5
Preview abstract
Data Center topologies employ multiple paths among servers to deliver scalable, cost-effective network capacity. The simplest and the most widely deployed approach for load balancing among these paths, Equal Cost Multipath (ECMP), hashes flows among the shortest paths toward a destination. ECMP leverages uniform hashing of balanced flow sizes to achieve fairness and good load balancing in data centers. However, we show that ECMP further assumes a balanced, regular, and fault-free topology, which are invalid assumptions in practice that can lead to substantial performance degradation and, worse, variation in flow bandwidths even for same size flows.
We present a set of simple algorithms that achieve Weighted Cost Multipath (WCMP) to balance traffic in the data center based on the changing network topology. The state required for WCMP is already disseminated as part of standard routing protocols and it can be readily implemented in the current switch silicon without any hardware modifications. We show how to deploy WCMP in a production OpenFlow network environment and present experimental and simulation results to show that variation in flow bandwidths can be reduced by as much as 25X by employing WCMP relative to ECMP.
View details