Necro-reaper: Pruning away Dead Memory Traffic in Warehouse-Scale Computers

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery (2025)

Abstract

Memory bandwidth is emerging as a critical bottleneck in warehouse-scale computing (WSC). This work reveals that a significant portion of memory traffic in WSC is surprisingly unnecessary, consisting of unnecessary writebacks of deallocated data and fetches of uninitialized data. This issue is particularly acute in WSC, where short-lived heap allocations bigger than a cache line are prevalent. To address this problem, this work proposes a pragmatic approach tailored to WSC. Leveraging the existing WSC ecosystem of vertical integration, profile-guided compilation flows, and customized memory allocators, this work presents Necro-reaper, a novel software/hardware co-design that avoids dead memory traffic without requiring the hardware tracking of prior work. New ISA instructions enable the hardware to avoid unnecessary dead traffic, while extended software components, including a profile-guided compiler and memory allocator, optimize the utilization of these instructions. Evaluation across a diverse set of 10 WSC workloads demonstrates that Necro-reaper achieves a geomean memory traffic reduction of 26% and a geomean IPC increase of 6%.