Material Detail

The optimistic principle for online planning in Markov decision processes

The optimistic principle for online planning in Markov decision processes

This video was recorded at Large-scale Online Learning and Decision Making (LSOLDM) Workshop, Cumberland Lodge 2012. Given an initial state, what is the best possible action that can be returned by a planning algorithm that is given a finite numerical budget (e.g. number of calls to a model of the state-transition and reward functions). We investigate optimistic strategies and provide regret bounds in terms of a new measure of the complexity of the planning problem.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.