Multi-Task Adapters for On-Device Audio Inference

Dominik Roblek
IEEE Signal Processing Letters, 27, pp. 630-634

Abstract

The deployment of deep networks on mobile devices
requires to efficiently use the scarce computational resources, expressed as
either available memory or computing cost. When addressing multiple tasks
simultaneously, it is extremely important to share resources across tasks,
especially when they all consume the same input data, e.g., audio samples
captured by the on-board microphones. In this paper we propose a multi-task
model architecture that consists of a shared encoder and multiple task-specific
adapters. During training, we learn the model parameters as well as the
allocation of the task-specific additional resources across both tasks and
layers. A global tuning parameter can be used to obtain different multi-task
network configurations finding the desired trade-off between cost and the level
of accuracy across tasks. Our results show that this solution significantly
outperforms a multi-head model baseline. Interestingly, we observe that the optimal
resource allocation depends on both the task intrinsic characteristics as well
as on the targeted cost measure (e.g., memory or computing cost).