Material Detail

Factorizing Gigantic Matrices

Factorizing Gigantic Matrices

This video was recorded at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Athens 2011. Low-rank approximations of data matrices have become an important tool in machine learning and data mining. They allow for embedding high dimensional data in lower dimensional spaces and can therefore mitigate effects due to noise, uncover latent relations, or facilitate further processing. These properties have been proven successful in many applications areas such as bio-informatics, computer vision, text process ing, recommender systems, social network analysis, among others. Present day technologies are characterized by exponentially growing amounts of data. Recent advances in sensor technology, Internet applications, and communication networks call for methods that scale to very large and/or growing data matrices. In this tutorial, we discuss basic characteristics of matrix factorization and introduce several recent approaches that scale to modern massive data analysis problems. The tutorial aims at a wide audience as it reviews both machine learning and data mining techniques. It is intended for PhD students, practitioners, and researchers who are interested in large scale machine learning and data analysis. The tutorial is divided into three parts: Part I: Matrix Factorization — Traditional Optimization Approaches and Statistical Foundations: In this block, we will discuss foundations and multi-linear extensions of traditional methods such as SVD, PCA, K-Means, and Vector Quantization. Part II: Constraint Matrix Factorization Many real-world applications of matrix factorization impose constraints on the factorization problem. For instance, matrix factors need to be non-negative, convex combinations of existing data, or compact binary codes. Among others, we discuss techniques such as Spectral Hashing, NMF, Archetypal Analysis, CNMF, and CH-NMF. Part III: Data-driven Matrix Factorization Techniques: The first and second part of the tutorial consider norm minimization problems to obtain suitable matrix factors. Recent approaches that extend matrix factorization towards massive data assume a different point of view: they attempt to maximize the volume of a selection of rows and columns of a given data matrix. In this final part of the tutorial, we present and review approaches such as FastMap, CUR, CMD, and SiVM. In each of the parts, we present practical applications from fields such as image processing, computer vision, robotics, web mining, and social media analysis.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.