Practitioner Translate at LocWorld

I will describe the rest of the very interesting Pre-conference Day and presentations by the organizers (TAUS) as well as other 3 companies which are either machine translation developers or practitioners of automated translation solutions.

Presentations brought different perspectives to the machine translation landscape, with efforts and advancements by several companies, including Pangeanic.

Maxim Khalilov from TAUS summarized the good work being done by the organization within the MosesCore project. His presentation was an invitation to visit and find out more about tools, data and resources. Amongst the tools, he mentioned other alternatives to Moses (Thot, for instance), as well as a collection of TAUS features on quality evaluation. Important for new entrants or those with an interest in MT is the collection of data at TAUS: Europarl (1,8 million sentences), JRC-Acquis (270 paragraphs), Hansards, UN, OPUS, LDC Linguistic Data Consortium, ELRA, as well as TAUS’ own repositories, and many other features TAUS is developing.

Safaba's Udi Hershkovich presented his company’s effort to offer a commercial application translating not only documentation but all company communications, chat, etc., geared towards companies producing enormous amount of content in many different channels. He touched upon problems encountered by developers when applying the technology in several areas. For example, when translating HTML: even when dealing with a perfect HTML, MT will translate keywords as well and one may not hit the perfect keyword necessary for another market. This is an area for development and improvement for everyone. An interesting point to many is that with Moses one needs a lot of data (or used to) but with other technologies you only need small sets of data to create MT. For example, one can look at similar translations, learn from them and then train engines.

Lori Thicke from LexWorks presented a pragmatic approach to the application of machine translation solutions. Lori talked about having different engines and an agnostic approach to machine translation, depending on the language pair. For Lori MT is not a tool but a process, a process which needs to be understood in order to be utilized properly: one needs to understand training sets. Lori provided relevant ratings for “understandable” translations by developers like MS Translator, Systran Hybrid, etc translating online content.

Rahzeb and Max continued describing further EU-funded MosesCore projects by TAUS, such as the Quality Framework and how the consortium is working on quality assessment and measurement. I found this and later sessions by Lucia Specia and QTLaunchPad on post-editing very enlightening since they can offer an overview of tools developed to provide metrics. I believe that whilst resistance to adoption of computer-assisted tools in the 90’s diminished as the tools became desktop applications, this is also happening with MT.

Plus, metrics and also the ubiquity of machine translation will provide objective criteria to raise the credibility of different translation technologies and standard QA functions across the industry. Users need to feel and be empowered in order to engage them and use the technology.

That has long been Pangeanic’s philosophy (see our presentation at Localization World Barcelona, 2011). It is only when the technology does not seem to be imposed from above, but perceived as “yet another tool to enhance productivity” that machine translation will stop being considered a threat and become mainstream. But we better deal with it in our next entry.

Next time you think languages, think Pangeanic Your Machine Translation Customization Solutions