LEARNING AN INVERTIBLE OUTPUT MAPPING CAN MITIGATE SIMPLICITY BIAS IN NEURAL NETWORKS

Anshul Nasery
Sravanti Addepalli
Will be submitted to ICLR 2023 (2023) (to appear)
Google Scholar

Abstract

Deep Neural Networks (DNNs) are known to be brittle to even minor distribution
shifts compared to the training distribution . Simplicity Bias (SB) of DNNs – bias
towards learning a small number of simplest features – has been demonstrated
to be a key reason for this brittleness. Prior works have shown that the effect of
Simplicity Bias is extreme – even when the features learned are diverse, training
the classification head again selects only few of the simplest features, leading to
similarly brittle models. In this work, we introduce Feature Reconstruction Regularizer
(FRR) in the linear classification head, with the aim of reducing Simplicity
Bias, thereby improving Out-Of-Distribution (OOD) robustness. The proposed
regularizer when used during linear layer training, termed as FRR-L, enforces that
the features can be reconstructed back from the logit layer, ensuring that diverse
features participate in the classification task. We further propose to finetune the
full network by freezing the weights of the linear layer trained using FRR-L. This
approach, termed as FRR-FLFT or Fixed Linear FineTuning, improves the quality
of the learned features, making them more suitable for the classification task.
Using this simple solution, we demonstrate up to 12% gain in accuracy on the
recently introduced synthetic datasets with extreme distribution shifts. Moreover,
on the standard OOD benchmarks recommended on DomainBed, our technique
can provide up to 5% gains over the existing SOTA methods .

Research Areas