Material Detail
Language identification of documents and queries
This video was recorded at 6th Russian Summer School in Information Retrieval (RuSSIR), Yaroslavl 2012. Language identification is a relatively simple and well-solved task. In the talk, I will give an overview of existing standard techniques, and discuss their application to two text types: crawled Web documents and user search queries. Both present specific challenges: - for Web documents - multilinguality, genre variability; - for queries - they are just too short for reliable attribution: hence the need for extra data (user context) to resolve potential ambiguity. I will talk about Yandex endeavours to cope with all that.
Quality
- User Rating
- Comments
- Learning Exercises
- Bookmark Collections
- Course ePortfolios
- Accessibility Info