1 min read

Training Data: Technical guide to SMT

A new guide for anyone interested in working with MT and using his/her own data to create machine translation engines has been published by TAUS in its website. The technical guide  to SMT Training Data is intended for users and any organization keen to train engines with its own data.  It deals with the preparation of translation training data for statistical machine translation. It examines the processes for data preparation (typically bilingual TMX) which are the catalysts to enable both data and algorithms to work together. TAUS' report by Tom Hoar also explores how to define an organization's training data strategy to match overall system design, identifying potential data sources for bilingual, well-aligned TMX. It also talks about the challenges faced when merging corpora from multiple sources to create large but stable data sets, exploring several methods to prepare translation memories from several sources into Statistical Machine Translation training data. Finally, it looks into the speech roots of SMT and introduces the concept of exception management as a context for preparing Statistical Machine Translation (SMT) training data. Pangeanic has made use of many bilingual data sets from several organizations, including the EU and UN in order to mix data and customize machine translation engines for some of its clients. Complete news: https://www.taus.net/think-tank/reports/translate-reports/technical-guide-to-smt-training-data Further reading: TAUS Technical Guide to SMT training data

Next time you think languages, think Pangeanic Your Machine Translation Customization Solutions



Artículos relacionados

Pangeanic: the solutions you need in 2023

After years of standstill and uncertainty about what the future held in all sectors, 2022 gave us a taste of long-forgotten "normality" as different ways of working and new opportunities emerged and stimulated markets.

Leer más

Interview With María Grandury on Artificial Intelligence and NLP

At the young age of 25, María Grandury has already made a name for herself in the field of Artificial Intelligence in Spain. Just two years ago, in the middle of the pandemic, she was finishing her double degree in mathematics and physics. During...

Leer más

What Is Software Localization?

Software localization services facilitate global communication regardless of physical borders. They offer an advantage for the user and, mostly, for developers and companies providing online services and applications.

Leer más