Pangeanic is a company in constant technological development: our award-winning R&D is focused on AI and Neural Machine Translation research and...
The NTEU Consortium that Pangeanic has been leading since 2019 has completed the massive data upload to ELRC, with neural engines being available for European Public Administrations via the European Language Grid. The NTEU project goals were the gathering and re-use of many of the language resources of several European CEF projects to create near-human quality machine translation engines for use by Public Administrations by EU Member States. This massive engine-building endeavor encompassed all possible combinations among all EU official languages, in combinations ranging from English to Spanish, German or French to low-resource languages such as Latvian, Finnish or Bulgarian into Greek, Croatian or Maltese. Every engine has been tested using the project’s specific evaluation tool MTET (Machine Translation Evaluation Tool), which has been specifically developed for the project. MTET ranked the performance of direct combination engines (eg, not “pivoting” through English) versus a set of free online engines. Two graders had to rank every single engine (language combination) in order to normalize human judgement and asses how close the engines’ ouput was to a reference human expression. A view of Machine Translation Evaluation tool MTET Human graders could leave some unclear evaluations unfinished (if they needed to stop and come back later), although segment evaluation done consecutively, one sentence after another was preferable. As we can see below, some language combinations (Irish Gaelic into Greek) were a challenge! Fig. 2 Typical Evaluation Screen In order to guarantee final quality, human graders did not know which input came from the NTEU engines and which input came from a second translation by a generalist, online MT provider that was used as benchmark). They ranked each input by moving a slider from right to left and from 0 to 100. The aim was that during the evaluation, they could assess whether the machine-generated sentence adequately expressed the meaning contained in the source, that is, how close it was to how a human would have written it.
- Accuracy : 33%
- Fluency : 33%
- Adequacy [terminology] : 33%
Pangeanic travels to Japan to present neural networks research and Cor technologies at the next TAUS Technology event: Tokyo Pangeanic has been...