Material Detail

Autonomous Exploration in Reinforcement Learning

Autonomous Exploration in Reinforcement Learning

This video was recorded at NIPS Workshops, Sierra Nevada 2011. One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in a manner appropriate to their current level of skills. For analysing such autonomous exploration theoretically, an evaluation criterion is required to compare exploration algorithms. Unfortunately, no commonly agreed evaluation criterion has been established yet. As one possible criterion, we consider in this work the navigation skill of a learning agent after a number of exploration steps. In particular, we consider how many exploration steps are required, until the agent has learned reliable policies for reaching all states in a certain distance from a start state. (Related but more general objectives are also of interest.) While this learning problem can be addressed in a straightforward manner for finite MDPs, it becomes much more interesting for potentially infinite (but discrete) MDPs. For infinite MDPs we can analyse how the learning agent increases its navigation skill for reaching more distant states, as the exploration time increases. We show that an optimistic exploration strategy learns reliable policies when the number of exploration steps is linear in the number of reachable states and in the number of actions. The number of reachable states is not known to the algorithm, but the algorithm adapts to this number.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.