Material Detail

Using Fast Weights to Improve Persistent Contrastive Divergence

Using Fast Weights to Improve Persistent Contrastive Divergence

This video was recorded at 26th International Conference on Machine Learning (ICML), Montreal 2009. The most commonly used learning algorithm for restricted Boltzmann machines is contrastive divergence which starts a Markov chain at a data point and runs the chain for only a few iterations to get a cheap, low variance estimate of the sufficient statistics under the model. Tieleman (2008) showed that better learning can be achieved by estimating the model's statistics using a small set of persistent "fantasy particles" that are not reinitialized to data points after each weight update. With sufficiently small weight updates, the fantasy particles represent the equilibrium distribution accurately but to explain why the method works with much larger weight updates it is necessary to consider the interaction between the weight updates and the Markov chain. We show that the weight updates force the Markov chain to mix fast, and using this insight we develop an even faster mixing chain that uses an auxiliary set of "fast weights" to implement a temporary overlay on the energy landscape. The fast weights learn rapidly but also decay rapidly and do not contribute to the normal energy landscape that defines the model.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.