Material Detail

Document Clustering via Dirichlet Process Mixture Model with Feature Selection

Document Clustering via Dirichlet Process Mixture Model with Feature Selection

This video was recorded at 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Washington 2010. One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we propose a novel approach, namely DPMFS, to address this issue. The proposed approach is designed 1) to group documents into a set of clusters while the number of document clusters is determined by the Dirichlet process mixture model automatically; 2) to identify the discriminative words and separate them from irrelevant noise words via stochastic search variable selection technique. We explore the performance of our proposed approach on both a synthetic dataset and several realistic document datasets. The comparison between our proposed approach and stage-of-the-art document clustering approaches indicates that our approach is robust and effective for document clustering.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.