Material Detail

The Use of Randomization and Statistical Significance in Data Mining

The Use of Randomization and Statistical Significance in Data Mining

This video was recorded at Practical Theories for Exploratory Data Mining (PTDM), Brussels 2012. The concept and theory of statistical significance testing is well established in a traditional setup, but not in the problem settings related to data mining. In this talk I discuss the formulation as well as advantages and limitations of the statistical significance testing approaches in data mining. A data mining problem, where the objective is to find patterns such as frequent sets or clusterings, can be formulated as a statistical significance testing problem if one can (i) define a null hypothesis, (ii) formulate a reasonable test statistic(s), and (iii) either map patterns to constraints to null hypothesis or map each pattern a test statistic of its own, the latter case resulting to a... Show More
Rate

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.
hidden