Material Detail

Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI)

Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI)

This video was recorded at Solomon seminar. Information retrieval in the vector space model is based on literal matching of terms in the documents and the queries. The model is implemented by creating the term-document matrix, which is formed on the base of frequencies of terms in documents. Literal matching of terms does not necessarily retrieve all relevant documents. Synonymy (multiple words having the same meaning) and polysemy (words having multiple meaning) are two major obstacles for efficient information retrieval. Latent semantic indexing (LSI) and concept indexing (CI) are information retrieval techniques embedded in the vector space model, which address the problem of synonymy and polysemy. The method of LSI is an information retrieval technique using a low-rank singular value decomposition (SVD) of the term-document matrix. Although the LSI method has empirical success, it suffers from the lack of interpretation for the low-rank approximation and, consequently, the lack of controls for accomplishing specific tasks in information retrieval. The method of CI uses centroids of clusters or so-called concept decomposition (CD) for lowering the rank of the term-document matrix. Here we compare SVD/LSI and CD/CI in terms of matrix approximations and precision of information retrieval.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.