Material Detail

Action-Gap Phenomenon in Reinforcement Learning

Action-Gap Phenomenon in Reinforcement Learning

This video was recorded at Video Journal of Machine Learning Abstracts - Volume 2. Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy \hat{\pi} with respect to an action-value function \hat{Q}, the performance loss E[V^*(X) - V^{\hat{X}} (X)] is upper bounded by O(|| \hat{Q} - Q^*||_\infty^{1+\zeta}), in which \zeta >= 0 is the parameter quantifying the action-gap regularity. For \zeta > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.