In a new online report, TAUS Data Association (TDA) highlights the fact that Pangeanic is a first example of a LSP company that, making extensive...
A new guide for anyone interested in working with MT and using his/her own data to create machine translation engines has been published by TAUS in its website. The technical guide to SMT Training Data is intended for users and any organization keen to train engines with its own data. It deals with the preparation of translation training data for statistical machine translation. It examines the processes for data preparation (typically bilingual TMX) which are the catalysts to enable both data and algorithms to work together. TAUS' report by Tom Hoar also explores how to define an organization's training data strategy to match overall system design, identifying potential data sources for bilingual, well-aligned TMX. It also talks about the challenges faced when merging corpora from multiple sources to create large but
stable data sets, exploring several methods to prepare translation memories from several sources into Statistical Machine Translation training data. Finally, it looks into the speech roots of SMT and introduces the concept of
exception management as a context for preparing Statistical Machine Translation (SMT) training data. Pangeanic has made use of many bilingual data sets from several organizations, including the EU and UN in order to mix data and customize machine translation engines for some of its clients.
Complete news:
https://www.taus.net/think-tank/reports/translate-reports/technical-guide-to-smt-training-data
Further reading:
- Knowledge Center: What is translation memory?
- Translation Technology: Translation Memory
Next time you think languages, think Pangeanic Your Machine Translation Customization Solutions
Related Articles
TAUS Data Association (TDA) highlights in a new online report the fact that Pangeanic is a first example of a LSP company that, making extensive...
A piece of data is a symbolic representation of a quantitative or qualitative attribute or variable; that is to say, it is a unit of information.