Material Detail

Free Energy and Relative Entropy Dualities: Connections to Path Integral Control and Applications to Robotics

Free Energy and Relative Entropy Dualities: Connections to Path Integral Control and Applications to Robotics

This video was recorded at Workshop on Statistical Physics of Inference and Control Theory, Granada 2012. While optimal control and reinforcement learning are fundamental frameworks for learning and control applications, their application to high dimensional control systems of the complexity of humanoid and biomimetic robots has largely been impossible so far. Among the key problems are that classical value function-based approaches run into severe limitations in continuous state-action spaces due to issues of value function approximation. Additionally, the computational complexity and time of exploring high dimensional state-action spaces quickly exceeds practical feasibility. As an alternative, researchers have turned into trajectory-based reinforcement learning, which sacri#ces global optimality in favor of being applicable to high-dimensional state-action spaces. Model-based algorithms, inspired by ideas of differential dynamic programming, have demonstrated some success if models are accurate. Model-free trajectory-based reinforcement learning has been limited by problems of slow learning and the need to tune many open parameters. Recently reinforcement learning has moved towards combining classical techniques from stochastic optimal optimal control and dynamic programming with learning techniques from statistical estimation theory and the connection between SDEs and PDEs via the Feynman-Kac Lemma. In this talk, I will discuss theoretical developments and extensions of path integral control to iterative cases and present algorithms for policy improvement in continuous state actions spaces. I will provide Information theoretic interpretations and extensions based on the fundamental relationship between free energy and relative entropy. The aforementioned relationship provides an alternative view of stochastic optimal control theory that does not rely on the Bellman principle. I will demonstrate the applicability of the proposed algorithms to control and learning of humanoid, manipulator and tendon driven robots and propose future directions in terms of theory and applications.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.