A Connection between Actor Regularization and Critic Regularization in Reinforcement Learning

Benjamin Eysenbach
Matthieu Geist
Ruslan Salakhutdinov
Sergey Levine
International Conference on Machine Learning (ICML) (2023)
Google Scholar

Abstract

As with any machine learning problem with limited data, effective offline RL
algorithms require careful regularization to avoid overfitting, with most methods
regularizing either the actor or the critic. These methods appear distinct. Actor
regularization (e.g., behavioral cloning penalties) is simpler and has appealing
convergence properties, while critic regularization typically requires significantly
more compute because it involves solving a game, but it has appealing lower-bound
guarantees. Empirically, prior work alternates between claiming better results with
actor regularization and critic regularization. In this paper, we show that these two
regularization techniques can be equivalent under some assumptions: regularizing
the critic using a CQL-like objective is equivalent to updating the actor with a BC-
like regularizer and with a SARSA Q-value (i.e., “1-step RL”). Our experiments
show that this theoretical model makes accurate, testable predictions about the
performance of CQL and one-step RL. While our results do not definitively say
whether users should prefer actor regularization or critic regularization, our results
hint that actor regularization methods may be a simpler way to achieve the desirable
properties of critic regularization. The results also suggest that the empirically-
demonstrated benefits of both types of regularization might be more a function of
implementation details rather than objective superiority.