Material Detail

Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?

This video was recorded at 26th International Conference on Machine Learning (ICML), Montreal 2009. Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value under random partitioning of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science / Programming & Programming Languages

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 8, 2015
Date Modified in MERLOT:: February 8, 2015
Author:: Nguyen Xuan Vinh, School of Electrical Engineering and Telecommunications, University of New South Wales
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States

Browse...

Disciplines with similar materials as Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?

Science and Technology / ... / Programming & Programming Languages

Material Detail

Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?

Quality

More about this material

Browse...

Disciplines with similar materials as Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?

People who viewed this also viewed

Other materials like Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?

Comments