Material Detail

Language identification of documents and queries

Language identification of documents and queries

This video was recorded at 6th Russian Summer School in Information Retrieval (RuSSIR), Yaroslavl 2012. Language identification is a relatively simple and well-solved task. In the talk, I will give an overview of existing standard techniques, and discuss their application to two text types: crawled Web documents and user search queries. Both present specific challenges: - for Web documents - multilinguality, genre variability; - for queries - they are just too short for reliable attribution: hence the need for extra data (user context) to resolve potential ambiguity. I will talk about Yandex endeavours to cope with all that.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.