Practical Performance Guarantees for Pipelined DNN Inference

Aaron Archer

Kuikui Liu

Matthew Fahrbach

Prakash Prabhu

Proceedings of the 41st International Conference on Machine Learning (2024), pp. 1655-1671

Download Google Scholar

Abstract

This work optimizes pipeline parallelism of machine learning model inference by
partitioning computation graphs into $k$ stages and minimizing the running time of the bottleneck stage.
We design practical algorithms for this NP-complete problem
and prove they are nearly optimal in practice by comparing against lower bounds
obtained from solving novel mixed-integer programming (MIP) formulations.
We apply these algorithms and lower-bound techniques
to production models to achieve substantial improvements in the approximation guarantees,
compared to simple combinatorial lower bounds.
For example, our new MIP formulations improve the lower bounds enough to
drop the geometric mean approximation ratio from $2.175$ to $1.082$ across
production data with $k=16$ pipeline stages.
This work shows that while bottleneck partitioning is theoretically hard,
in practice we have a handle on the algorithmic side of the problem and
much of the remaining challenge is in developing more accurate cost models
to give to the partitioning algorithms.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Practical Performance Guarantees for Pipelined DNN Inference

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Practical Performance Guarantees for Pipelined DNN Inference

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities