A Connection between Actor Regularization and Critic Regularization in Reinforcement Learning

Benjamin Eysenbach

Matthieu Geist

Ruslan Salakhutdinov

Sergey Levine

International Conference on Machine Learning (ICML) (2023)

Google Scholar

Abstract

As with any machine learning problem with limited data, effective offline RL
algorithms require careful regularization to avoid overfitting, with most methods
regularizing either the actor or the critic. These methods appear distinct. Actor
regularization (e.g., behavioral cloning penalties) is simpler and has appealing
convergence properties, while critic regularization typically requires significantly
more compute because it involves solving a game, but it has appealing lower-bound
guarantees. Empirically, prior work alternates between claiming better results with
actor regularization and critic regularization. In this paper, we show that these two
regularization techniques can be equivalent under some assumptions: regularizing
the critic using a CQL-like objective is equivalent to updating the actor with a BC-
like regularizer and with a SARSA Q-value (i.e., “1-step RL”). Our experiments
show that this theoretical model makes accurate, testable predictions about the
performance of CQL and one-step RL. While our results do not definitively say
whether users should prefer actor regularization or critic regularization, our results
hint that actor regularization methods may be a simpler way to achieve the desirable
properties of critic regularization. The results also suggest that the empirically-
demonstrated benefits of both types of regularization might be more a function of
implementation details rather than objective superiority.

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

A Connection between Actor Regularization and Critic Regularization in Reinforcement Learning

Abstract

Research Areas

Meet the teams driving innovation