Dual PatchNorm

Neil Houlsby
Transactions on Machine Learning Research (2023) (to appear)
Google Scholar


We discover that just placing two LayerNorms: before and after the patch embedding layer leads to improvements over well-tuned ViT models. In particular, this outperforms exhaustive search for alternative LayerNorm placement strategies in the transformer block itself.

Research Areas