Material Detail

Gradient Methods for Machine Learning

Gradient Methods for Machine Learning

This video was recorded at Machine Learning Summer School (MLSS), Canberra 2005. Gradient methods locally optimize an unknown differentiable function, and thus provide the engines that drive much machine learning. Here we'll take a look under the hood, beginning with brief overview of classical gradient methods for unconstrained optimization: * Steepest descent, * Newton's method * Levenberg-Marquardt * BFGS * Conjugate gradient. To cope with the flood of data we find ourselves in today, stochastic approximation of the gradient from subsamples of data becomes a necessity. Unfortunately the noise this introduces into the gradient is not tolerated well by the classical gradient methods, with the exception of steepest descent, which however is very slow to converge. We'll see how local step size adaptation can be used to accelerate the convergence of stochastic gradient descent, culminating in the recent stochastic meta-descent (SMD) algorithm. SMD requires certain Hessian-vector products which can be computed efficiently via algorithmic (or automatic) differentiation (AD), a set of techniques that help automate the correct implementation of gradient methods in general. We'll discuss the basic concepts of AD, and learn simple ways to implement the forward mode of AD, and with it the fast Hessian-vector product.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.