Material Detail

Identifying the Original Contribution of a Document via Language Modeling

Identifying the Original Contribution of a Document via Language Modeling

This video was recorded at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Bled 2009. One major goal of text mining is to provide automatic methods to help humans grasp the key ideas in ever-increasing text corpora. To this effect, we propose a statistically well-founded method for identifying the original ideas that a document contributes to a corpus, focusing on self-referential diachronic corpora such as research publications, blogs, email, and news articles. Our statistical model of passage impact defines (interesting) original content through a combination of impact and novelty, and the model is used to identify each document's most original passages. Unlike heuristic approaches, the statistical model is extensible... Show More
Rate

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Browse...

Disciplines with similar materials as Identifying the Original Contribution of a Document via Language Modeling

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.