Material Detail

A simple multi-armed bandit algorithm with optimal variation-bounded regret

A simple multi-armed bandit algorithm with optimal variation-bounded regret

This video was recorded at 24th Annual Conference on Learning Theory (COLT), Budapest 2011. We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of O(√QlogT), where Q is the total quadratic variation of all the arms.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.