CliqueMap: Productionizing an RMA-Based Distributed Caching System

Aditya Akella
Amanda Strominger
Arjun Singhvi
Maggie Anderson
Rob Cauble
Thomas F. Wenisch
SIGCOMM 2021 (2021) (to appear)

Abstract

Distributed caching is a key component in the design of performant, scalable Internet services, but accessing such caches
via RPC incurs high cost. Remote Memory Access (RMA)
offers a promising, less costly alternative, but achieving a rich
production feature set with RMA-based systems is a significant challenge, as the rich abstraction of RPC lends itself to
solutions for interoperability and upgradeability requirements
of real systems. This work describes CliqueMap, a fully productionized RMA/RPC hybrid serving and caching system,
and the production experience derived from three years of
operation in Google’s datacenters. Building on internal technologies, CliqueMap serves multiple internal product areas
and underlies several end-user-visible services.