Streambox: Modern Stream Processing on a Multicore Machine

Felix Xiaozhu Lin
Gennady Pekhimenko
Heejin Park
Hongyi Xin
Myeongjae Jeon
The USENIX Annual Technical Conference, San Jose, CA. (2017)
Google Scholar

Abstract

To monitor and respond to events in real time, stream analytics
have a soaring demand for high throughput and
low latency. Central to meeting demand, even in a distributed
system, is the performance of a single machine.
This paper presents StreamBox, a novel stream processing
engine that exploits the parallelism and memory hierarchies
in modern multicore hardware. StreamBox executes
a pipeline of transforms over records that may arrive
out-of-order. For each transform, it groups records
in ordered epochs based on watermark timestamps that
guarantee no subsequent record timestamp will precede
it. The key contribution of this work is the generalization
of out-of-order record processing to out-of-order
epoch processing per transform to produce abundant parallelism.
We introduce a data structure called cascading
containers that manages dependences and concurrency
among multiple concurrent epochs in each transform
and in the pipeline, maximizing available parallelism
while minimizing synchronization overheads. StreamBox
creates sequential memory layout of records based
on temporal windows and steers record flows to optimize
NUMA locality. On a 56-core machine, StreamBox processes
up to 38M records per second (38 GB/s), which is
comparable to a cluster of 100 – 200 CPU cores, while
reducing the pipeline delay by 20× to 50 ms.