K for the price of 1. Parameter efficient multi-task and transfer learning

Andrew Howard
International Conference on Learning Representations (2019)

Abstract

In this paper we introduce a novel method that enables parameter efficient transfer and multitask learning.
We show that by reusing more than 95\% of the parameters we can re-purpose neural networks to solve very
different types of problems such as going from COCO-dataset SSD detection to Imagenet classification.
Our approach allows both simultaneous (e.g. multi-task) learning as well as sequential fine-tuning where
we change the already trained networks to solve a different problem.
We show that our approach leads to significant increase in accuracy when compared to traditional logits-only fine-tuning
while using much fewer parameters. Interestingly, for multi-task learning our approach sometimes acts as a regularizer often leading
to improved performance when compared to models trained on a single task.

Our approach has multiple immediate applications. It can be used to dramatically increase the number of models available in resource-constrained settings, since the marginal cost of a new model is now less than 5\% of the full model. The constrained fine-tuning enables better generalization when limited amount data is available. We evaluate our approach on multiple datasets and multiple models.