Material Detail

Cross-Lingual Document Retrieval through Hub Languages

Cross-Lingual Document Retrieval through Hub Languages

This video was recorded at NIPS Workshops, Lake Tahoe 2012. We address the problem of learning similarities between documents written in different languages for language pairs where little or no direct supervision (in the form of a comparable or parallel corpus) is available. To make up for the lack of direct supervision, our approach takes advantage of the fact that they may be linked indirectly by a hub language. That is, correspondences exist between each of the languages and a third, hub language. The main goal of our paper is to explore the viability of cross-lingual learning under such conditions. We propose a method that extracts a set of multilingual topics that facilitate a common representation of documents in different languages. The method is suitable for a comparable multilingual corpus with missing documents. We evaluate the approach in a truly multi-lingual setting, performing document retrieval across eight Wikipedia languages.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.