Finding Safe Zones of policies Markov Decision Processes

Lee Cohen
Michal Moshkovitz
NeurIPS 2023 (2023)

Abstract

Given a policy, we define a {\newprob}%\emph{safe zone}
as a subset of states, such that most of the policy's trajectories are confined to this subset.
The quality of the {\newprob }%safe zone
is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. Safe zones are especially interesting when they have a small number of states and low escape probability.

We study the complexity of finding optimal {\textsc{safeZones}},
and show that in general the problem is computationally hard. For this reason we concentrate on computing approximate {\textsc{safeZones}.}
Our main result is a bi-criteria approximation algorithm which gives a factor of almost $2$ approximation for both the escape probability and safe zone size, using a polynomial size sample complexity.
We conclude the paper with an empirical evaluation of our algorithm.

Research Areas