The Japanese Technical Communicators Association (JTCA) has invited Pangeanic's CEO Manuel Herranz as an invited guest speaker to the TC Symposium...
The
NTEU Consortium that Pangeanic has been leading since 2019 has completed the massive data upload to ELRC, with neural engines being available for European Public Administrations via the
European Language Grid. The
NTEU project goals were the gathering and re-use of many of the language resources of several European CEF projects to create near-human quality machine translation engines for use by Public Administrations by EU Member States. This massive engine-building endeavor encompassed all possible combinations among all EU official languages, in combinations ranging from English to Spanish, German or French to low-resource languages such as Latvian, Finnish or Bulgarian into Greek, Croatian or Maltese. Every engine has been tested using the project’s specific evaluation tool MTET (Machine Translation Evaluation Tool), which has been specifically developed for the project. MTET ranked the performance of direct combination engines (eg, not “pivoting” through English) versus a set of free online engines. Two graders had to rank every single engine (language combination) in order to normalize human judgement and asses how close the engines’ ouput was to a reference human expression.
A view of Machine Translation Evaluation tool MTET Human graders could leave some unclear evaluations unfinished (if they needed to stop and come back later), although segment evaluation done consecutively, one sentence after another was preferable. As we can see below, some language combinations (Irish Gaelic into Greek) were a challenge!
Fig. 2 Typical Evaluation Screen In order to guarantee final quality, human graders did not know which input came from the NTEU engines and which input came from a second translation by a generalist, online MT provider that was used as benchmark). They ranked each input by moving a slider from right to left and from 0 to 100. The aim was that during the evaluation, they could assess whether the machine-generated sentence adequately expressed the meaning contained in the source, that is, how close it was to how a human would have written it.
Evaluation Criteria
Another challenge was to standardize human criteria. Different people may have different linguistic preferences which can affect sentence evaluation. Thus, it was important from the beginning to follow the same scoring guidelines. To standardize criteria, Pangeanic laid out a set of instructions, together with the Barcelona SuperComputing Centre, and that had been proven as academic methods to guarantee all evaluators follow the same scoring methods across languages. Unlike SMT methods (based on BLEU scores) NMT needed to be ranked on accuracy, fluency and terminology. Those 3 key items were defined as followed Accuracy: defined as a sentence containing the meaning of the original, even though synonyms may have been used. Fluency: the grammatical correctness of the sentence (gender agreements, plural / singular, case declension, etc.) Adequacy [Terminology]: the proper use of in-domain terms agreed by the client and the developer and that are for use in production but may not be standard or general terms (the specific jargon). When ranking a sentence, the following weights were typically applied :- Accuracy : 33%
- Fluency : 33%
- Adequacy [terminology] : 33%
Related Articles
Pangeanic has been officially invited as a guest speaker on MT at Localization World Berlin, June 2009. Pangeanic will share the panel with other 2...
E-commerce translation services help companies expand into international markets, connect with customers, promote trust, and build lasting bonds....