Material Detail

Collecting aligned textual corpora from the Hidden Web

Collecting aligned textual corpora from the Hidden Web

This video was recorded at W3C Workshop: Content on the Multilingual Web, Pisa 2011. With the constant growth of web based content large collections of textual become available. Many if not most professional non-English web sites offer translated webpages to English and other languages of their clients and partners. This are usually professional translation and are abundant. We call this Hidden Web. We intend to present possibilities, problems and best practices for harnessing such aligned textual corpora. Such data can then be efficiently used as a translation memory for example as help for a human translators or as training data for machine translation algorithms.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.