Material Detail
Fully Distributed EM for Very Large Datasets
This video was recorded at 25th International Conference on Machine Learning (ICML), Helsinki 2008. In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework which fully distributes the entire EM procedure. Each node interacts with only parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce approach, on two tasks: word alignment and topic modeling.
Quality
- User Rating
- Comments
- Learning Exercises
- Bookmark Collections
- Course ePortfolios
- Accessibility Info