Material Detail

Rapid Stochastic Gradient Descent: Accelerating Machine Learning

Rapid Stochastic Gradient Descent: Accelerating Machine Learning

This video was recorded at Machine Learning Summer School (MLSS), Canberra 2006. The incorporation of online learning capabilities into real-time computing systems has been hampered by a lack of efficient, scalable optimization algorithms for this purpose: second-order methods are too expensive for large, nonlinear models, conjugate gradient does not tolerate the noise inherent in online learning, and simple gradient descent, evolutionary algorithms, etc., are unacceptably slow to converge. I am addressing this problem by developing new ways to accelerate stochastic gradient descent, using second-order gradient information obtained through the efficient computation of curvature matrix-vector products. In the stochastic meta-descent (SMD) algorithm, this cheap curvature information is built up iteratively into a stochastic approximation of Levenberg-Marquardt second-order gradient steps, which are then used to adapt individual gradient step sizes. SMD handles noisy, correlated, non-stationary signals well, and approaches the rapid convergence of second-order methods at only linear cost per iteration, thus scaling up to extremely large nonlinear systems. To date it has enabled new adaptive techniques in computational fluid dynamics and computer vision. Our most recent development is a version of SMD operating in reproducing kernel Hilbert space.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.