Material Detail

Characterization of Linkage Based Clustering

Characterization of Linkage Based Clustering

This video was recorded at NIPS Workshops, Whistler 2009. There are a wide variety of clustering algorithms that, when run on the same data, often produce very different clusterings. Yet there is no principled method to guide the selection of a clustering algorithm. The choice of an appropriate clustering is, of course, task dependent. As such, we must rely on domain knowledge. The challenge is to communicate such knowledge between the domain expert and the algorithm designer. One approach to providing guidance to clustering users in the selection of a clustering algorithm is to identify important properties that a user may want an algorithm to satisfy, and determine which algorithms satisfy each of these properties. Clustering users can then utilize prior knowledge to determine the properties that make sense for their application. Ultimately, there would be a sufficiently rich set of properties that would provide detailed enough guidelines for a wide variety of clustering users. For a property to be useful, a user needs to be able to easily determine the desirability of the property. Such a description of clustering algorithms would yield principled guidelines for clustering algorithm selection by answering a series of simple questions. Bosagh Zadeh and Ben-David [1] make progress in this direction by providing a set of abstract properties that characterize single linkage. In this work, we give another result in the same direction by characterizing a family of clustering algorithms. These are initial steps toward the ambitious program of developing broad guidelines for clustering algorithm selection. Linkage-based clustering is one of the most commonly-used and widely-studied clustering paradigms. We provide a surprisingly simple set of properties that uniquely identify linkage-based clustering algorithms. Our characterization highlights how linkage-based algorithms compare to other clustering algorithms. Combining previously proposed properties with our newly proposed ones, we show how these properties partition the space of commonly-used clustering algorithms. Specifically, we show which of these properties are satisfied by common linkage-based, centroid-based, and spectral clustering algorithms. We hope that this analysis, as well as our characterization of linkage-based clustering, will provide useful guidelines for users in selecting clustering algorithms.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.