Improving the quality of a customized SMT system using shared training data

At the MT Summit in Ottawa (August 28, 2009), Microsoft’s Chris Wendt presented the findings from a recent pilot project using translation memories from more than ten TDA members to train the Microsoft statistical machine translation engine.

Main tests were performed in two languages: Chinese and German, with customization for Sybase iAnywhere. Additional tests also were run on Polish and Japanese languages with customization for Adobe and Dell.

BLEU scores went up significantly with increases between 22% and 74% compared to engines trained purely on Microsoft or general available data. These tests point to better quality results and improvement in a system’s performance by adding more parallel data from other organizations – in this case shared data through the TAUS platform.

This is a link to this seminal presentation

View more presentations from TAUS.
Next time you think languages, think Pangeanic
Translation Services, Translation Technologies, Machine Translation

 

One thought on “Improving the quality of a customized SMT system using shared training data

Leave a Reply

Your email address will not be published. Required fields are marked *


nine − 7 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>