Material Detail

Persistence-based Clustering

Persistence-based Clustering

This video was recorded at Solomon seminar. Clustering is a classical problem which looks for important segments in an unstructured data set. In general, this is an ill-posed problem. A common approach is to consider the data set as a sample of an unknown probability distribution function on some underlying space. Clustering then becomes a problem of understanding the behaviour of the distribution function. In this talk, I will introduce persistence-based clustering. Under some mild assumptions, the algorithm comes with a variety of strong theoretical guarantees. In particular, it provably approximates the structure of the underlying distribution function even when underlying space is only approximately known. The approach is based heavily on persistent homology (also refered to as topological persistence), a relatively recent development in the area of computational topology. It is precisely this framework which makes many of the proofs possible. The talk will include a general introduction to persistence so no prior knowledge is expected. On the practical side, the algorithm is efficient, both in memory size and running time, so it can handle large, high dimensional data sets quickly. Finally, it provides visual feedback in addition to the clusters, something which is particularly useful when the data sets cannot be visualized.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Browse...

Disciplines with similar materials as Persistence-based Clustering

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.