Implicit biases in multitask and continual learning from a backward error analysis perspective

Mathematics of Modern Machine Learning Workshop at NeurIPS 2023 (2023)

Abstract

We extract implicit training biases in the multi-task and continual learning settings for neural networks trained with stochastic gradient descent using backward error analysis. In particular, we derive modified losses that are implicitly minimized during training. They have three terms: the original loss, accounting for convergence, an implicit gradient regularization term, accounting for performance, and a last term, the conflict term, which can theoretically be detrimental to both convergence and and performance. In multitask setting, the conflict term is a well-known quantity, measuring the gradient alignment between the tasks, while in continual learning setting the conflict term is a new quantity in deep learning, although well-known in many areas of mathematics: The Lie bracket between the task gradients. This work is purely mathematical and illustrates the power of backward error analysis to methodically compute implicit biases in gradient-based optimization in deep learning.