Sharing Decoders: Network Fission for Multi-task Pixel Prediction

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, IEEE/CVF (2022), pp. 3771-3780

Abstract

We examine the benefits of splitting encoder-decoders for
multitask learning and showcase results on three tasks (semantics, surface normals, and depth) while adding very few
FLOPS per task. Current hard parameter sharing methods for multi-task pixel-wise labeling use one shared encoder with separate decoders for each task. We generalize
this notion and term the splitting of encoder-decoder architectures at different points as fission. Our ablation studies on fission show that sharing most of the decoder layers in multi-task encoder-decoder networks results in improvement while adding far fewer parameters per task. Our
proposed method trains faster, uses less memory, results in
better accuracy, and uses significantly fewer floating point
operations (FLOPS) than conventional multi-task methods,
with additional tasks only requiring 0.017% more FLOPS
than the single-task network.

Research Areas