Material Detail

Using linguistic information as features for text categorization

Using linguistic information as features for text categorization

This video was recorded at NATO Advanced Study Institute on Mining Massive Data Sets for Security. We report on some experiences using linguistic information as additional features in a classical Vector Space Model[10]. Extracted information of every word like the Part Of Speech and stem, lexical root have been combined in different ways for experimenting on a possible improvement in the classification performance and on several algorithms, like SVM [3], BBR [] and PLAUM [6]. Automatic Text Classification, or Automatic Text Categorization as is also known, tries to related documents to predefined set of classes. Extensive research has been carried out on this subject [11] and a wide range of techniques are appliable to solve this task: feature extraction [5], feature weighting,... Show More


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.