Material Detail

Fully Distributed EM for Very Large Datasets

Fully Distributed EM for Very Large Datasets

This video was recorded at 25th International Conference on Machine Learning (ICML), Helsinki 2008. In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework which fully distributes the entire EM procedure. Each node interacts with only parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce approach, on two tasks: word alignment and topic modeling.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.