Neal Cardwell

Neal Cardwell

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Internet speed tests are an important tool to enable consumers and regulators to monitor the quality of Internet access. However, increased Internet speeds to the home and an increased demand for speed testing pose scaling challenges to providers of speed tests, who must maintain costly infrastructure to keep up with this demand. In recent years, this has led the popular NDT speed test to limit data transfer to a total of 250MB, which comes at the cost of accuracy for high bandwidth speed test clients. In this paper, we observe that the NDT speed test server’s congestion control algorithm (BBRv1) is also trying to estimate the capacity of the connection. We leverage this observation and signals from BBR to improve the accuracy and efficiency of speed tests. We first show how leveraging signals from BBR can more than double the accuracy of a 10MB test–from 17% to 43%–for clients with speeds over 400Mbps. We then show how using BBR signals to adaptively end the speed test reduces data transfer by 36% and increased accuracy by 13% for high bandwidth clients, relative to a 100MB fixed length test. Even accounting for clients that never observe enough samples to utilize the BBR signal, this adaptive approach still uses 25% less data than a fixed 100MB test with 37-44% higher accuracy. View details
    Preview abstract Datacenter network hotspots, defined as links with persistently high utilization, can lead to performance bottlenecks.In this work, we study hotspots in Google’s datacenter networks. We find that these hotspots occur most frequently at ToR switches and can persist for hours. They are caused mainly by bandwidth demand-supply imbalance, largely due to high demand from network-intensive services, or demand exceeding available bandwidth when compute/storage upgrades outpace ToR bandwidth upgrades. Compounding this issue is bandwidth-independent task/data placement by data-center compute and storage schedulers. We quantify the performance impact of hotspots, and find that they can degrade the end-to-end latency of some distributed applications by over 2× relative to low utilization levels. Finally, we describe simple improvements we deployed. In our cluster scheduler, adding hotspot-aware task placement reduced the number of hot ToRs by 90%; in our distributed file system, adding hotspot-aware data placement reduced p95 network latency by more than 50%. While congestion control, load balancing, and traffic engineering can efficiently utilize paths for a fixed placement, we find hotspot-aware placement – placing tasks and data under ToRs with higher available bandwidth – is crucial for achieving consistently good performance. View details
    Preview abstract The difficulty in gaining visibility into the fine-time scale hop-level congestion state of networks has been a key challenge faced by congestion control protocols for decades. How-ever, the emergence of commodity switches supporting in-network telemetry (INT) enables more advanced congestion control. In this paper, we presentPoseidon, a novel congestion control protocol that exploits INT to address blind spots of end-to-end algorithms and realize several fundamentally advantageous properties. Specifically, Poseidon realizes congestion control for the actual bottleneck hop. In the steady state,Poseidon realizes network-wide max-min fair bandwidth al-location. Furthermore, Poseidon decouples the bandwidth fairness requirement from the traditional AIMD control law, making it possible for Poseidon to converge fast and smooth out bandwidth oscillations. Equally important, Poseidon is de-signed to be amenable to incremental brownfield deployment in networks that mix INT and non-INT switches. Our testbed and simulation experiments show that compared to a widely-deployed state-of-the-art non-INT protocol, Swift, Poseidon improves op latency up to 10x in some percentiles (61% in average), lowers fabric RTT by more than 50%, reduces congestion window ramp up time by 40% while decreasing the throughput variation for flows with small windows by 94%.Finally, it is robust to reverse-path and multi-hop congestion. View details
    Preview abstract We describe our experience with Fathom, a system for identifying the network performance bottlenecks of any service running in the Google fleet. Fathom passively samples RPCs, the principal unit of work for services. It segments the overall latency into host and network components with kernel and RPC stack instrumentation. It records these detailed latency metrics, along with detailed transport connection state, for every sampled RPC. This lets us determine if the completion is constrained by the client, network or server. To scale while enabling analysis, we also aggregate samples into distributions that retain multi-dimensional breakdowns. This provides us with a macroscopic view of individual services. Fathom runs globally in our datacenters for all production traffic, where it monitors billions of TCP connections 24x7. For five years Fathom has been our primary tool for troubleshooting service network issues and assessing network infrastructure changes. We present case studies to show how it has helped us improve our production services. View details
    Preview abstract This document presents the RACK-TLP loss detection algorithm for TCP. RACK-TLP uses per-segment transmit timestamps and selective acknowledgments (SACKs) and has two parts. Recent Acknowledgment (RACK) starts fast recovery quickly using time-based inferences derived from acknowledgment (ACK) feedback, and Tail Loss Probe (TLP) leverages RACK and sends a probe packet to trigger ACK feedback to avoid retransmission timeout (RTO) events. Compared to the widely used duplicate acknowledgment (DupAck) threshold approach, RACK-TLP detects losses more efficiently when there are application-limited flights of data, lost retransmissions, or data packet reordering events. It is intended to be an alternative to the DupAck threshold approach. View details
    BBR: Congestion-Based Congestion Control
    C. Stephen Gunn
    Van Jacobson
    Communications of the ACM, 60 (2017), pp. 58-66
    Preview abstract By all accounts, today’s Internet is not moving data as well as it should. Most of the world’s cellular users experience delays of seconds to minutes; public Wi-Fi in airports and conference venues is often worse. Physics and climate researchers need to exchange petabytes of data with global collaborators but find their carefully engineered multi-Gbps infrastructure often delivers at only a few Mbps over intercontinental distances.6 These problems result from a design choice made when TCP congestion control was created in the 1980s—interpreting packet loss as “congestion.”13 This equivalence was true at the time but was because of technology limitations, not first principles. As NICs (network interface controllers) evolved from Mbps to Gbps and memory chips from KB to GB, the relationship between packet loss and congestion became more tenuous. Today TCP’s loss-based congestion control—even with the current best of breed, CUBIC11—is the primary cause of these problems. When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat. When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signal of congestion, leading to low throughput. Fixing these problems requires an alternative to loss-based congestion control. Finding this alternative requires an understanding of where and how network congestion originates. View details
    BBR: Congestion-Based Congestion Control
    C. Stephen Gunn
    Van Jacobson
    ACM Queue, 14, September-October (2016), pp. 20 - 53
    Preview abstract By all accounts, today’s Internet is not moving data as well as it should. Most of the world’s cellular users experience delays of seconds to minutes; public Wi-Fi in airports and conference venues is often worse. Physics and climate researchers need to exchange petabytes of data with global collaborators but find their carefully engineered multi-Gbps infrastructure often delivers at only a few Mbps over intercontinental distances.6 These problems result from a design choice made when TCP congestion control was created in the 1980s—interpreting packet loss as “congestion.”13 This equivalence was true at the time but was because of technology limitations, not first principles. As NICs (network interface controllers) evolved from Mbps to Gbps and memory chips from KB to GB, the relationship between packet loss and congestion became more tenuous. Today TCP’s loss-based congestion control—even with the current best of breed, CUBIC11—is the primary cause of these problems. When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat. When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signal of congestion, leading to low throughput. Fixing these problems requires an alternative to loss-based congestion control. Finding this alternative requires an understanding of where and how network congestion originates. View details
    Drilling Network Stacks with packetdrill
    Barath Raghavan
    USENIX ;login:, 38 (2013), pp. 48-52
    Preview abstract Testing and troubleshooting network protocols and stacks can be painstaking. To ease this process, our team built packetdrill, a tool that lets you write precise scripts to test entire network stacks, from the system call layer down to the NIC hardware. packetdrill scripts use a familiar syntax and run in seconds, making them easy to use during development, debugging, and regression testing, and for learning and investigation. View details
    Reducing Web Latency: the Virtue of Gentle Aggression
    Tobias Flach
    Barath Raghavan
    Shuai Hao
    Ethan Katz-Bassett
    Ramesh Govindan
    Proceedings of the ACM Conference of the Special Interest Group on Data Communication (SIGCOMM '13), ACM (2013)
    Preview abstract To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCP's current mechanisms fundamentally limit latency improvements. We performed a measurement study of a large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCP's timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mechanisms for TCP that judiciously use redundant transmissions to minimize timeout-driven recovery. Proactive, Reactive, and Corrective are three qualitatively different, easily-deployable mechanisms that (1) proactively recover from losses, (2) recover from them as quickly as possible, and (3) reconstruct packets to mask loss. Crucially, the mechanisms are compatible both with middleboxes and with TCP's existing congestion control and loss recovery. Our large-scale experiments on Google's production network that serves billions of flows demonstrate a 23% decrease in the mean and 47% in 99th percentile latency over today's TCP. View details
    packetdrill: Scriptable Network Stack Testing, from Sockets to Packets
    Lawrence Brakmo
    Matt Mathis
    Barath Raghavan
    Hsiao-keng Jerry Chu
    Tom Herbert
    Proceedings of the USENIX Annual Technical Conference (USENIX ATC 2013), USENIX, 2560 Ninth Street, Suite 215, Berkeley, CA, 94710 USA, pp. 213-218
    Preview abstract Testing today’s increasingly complex network protocol implementations can be a painstaking process. To help meet this challenge, we developed packetdrill, a portable, open-source scripting tool that enables testing the correctness and performance of entire TCP/UDP/IP network stack implementations, from the system call layer to the hardware network interface, for both IPv4 and IPv6. We describe the design and implementation of the tool, and our experiences using it to execute 657 test cases. The tool was instrumental in our development of three new features for Linux TCP—Early Retransmit, Fast Open, and Loss Probes—and allowed us to find and fix 10 bugs in Linux. Our team uses packetdrill in all phases of the development process for the kernel used in one of the world’s largest Linux installations. View details