DIY Machine Translation at TAUS Barcelona 2011

Written by M.Ángeles García | 10/11/11
Pangeanic’s attendance to the machine-translation summit TAUS Barcelona 2011 generated well-earned interest, as the company’s self-training, DIY machine translation technologies were unveiled in a recent press-release. The announcement that Pangeanic’s PangeaMT package will provide a customized API for users to its different engines became a hot topic after the news that Google will deprecate its free version and that “ The courtesy limit for existing Translate API v2 projects created prior to August 24, 2011 will be reduced to zero on December 1, 2011.” At TAUS, heated discussions focused on interoperability. It is becoming the the buzz word for the future if you are dealing with software, translation and more even if you are producing translation software (machine translation software in our case). You can view the presentation about User Empowerment “Provide the tools to those who need them” here. Pangeanic moves away from the license philosophy to provide access to its technology to machine-translation practitioners. Where this is not possible, the traditional SaaS model will always available, with in-house or cloud training. In a nutshell: - It costs money even though most vendors and clients can’t quantify it. - Some are calling for organizations or leaders. - Leveraging of TM losses 20% when switching vendors because of different tools being used. - Some translators simply refuse using one tool or another. Some tools also make translators not-so-productive, thus pushing rates up. - The current mix of free, cloud-based, licensed, SaaS and LSP-hosted tools is too much of an offering as they do not talk to each other. Perhaps new models are required. Human translator’s life should be made easier rather than focusing on formatting handling. Several industry players (Iris Orriss, from Microsoft, Karen Combe from PTC, Minette Normal from Autodesk and Eric Blassin from Lionbridge) stated that there are interoperability issues between software and documentation and that it costs money, that CMS doesn’t work well with TMS even with own same supplier. Suppliers can’t find ways to keep up and solve interoperability problems. Ideally, we should use UI with documentation but it is very difficult and there is no budget for it even though inconsistencies do occur. LSP’s are charged with the burden of costs and reduced potential of innovation and efficiency. You are at the mercy of the tools and sometimes it is trial and error to find the best solution. Lack of interoperability is frustrating. At TAUS, Smith Yewell, CEO Welocalize commented that lack of interoperability is causing productivity to be impaired and profitability to be undermined. We calculated it costs us 3M $ passing formats up and down and manual work. Because formatting affects transfer of data from our repositories (TMS or databases in MySQL, Oracle) to translation environment (Trados), quality issues happen as the wrong translator can be matched to the wrong job. Personally, I would agree that is the case with most LSPs facing conversion problems from different publishing, web and other formats, from Indesign to flash, html and even doc/rtf. Having a single, across-the-industry format would level the playground and increase efficiency. Interesting presentations came from the EU, which is the largest buyer of translation services. They are embracing the Moses platform to solve part of their problems in over more than 400 language combinations. The task the face is massive, as they have to work with any kind of combination from straight-forward Romance languages and English to morphology-rich Eastern European languages, agglutinating Finnish and Hungarian… Daunting task, but there are also many EU-sponsored R&D programs which can feed back eventually and help the solution. Spyrodon Pilos (EU’s DGT) stated that in 05/2010 Commission Task force was confirmed so that the need for MT is addressed. On 12/2010 ECMT service suspended (rule-based system, Systran), so the EU is looking at Open-source software. The new name is MT@EC and it has to be built on trust, confidentiality and continuity. The EU is building a data-driven system using all its internal TMs, cleaning and preparing them, filtering it and processing them for MT. Benchmarks are established internally with basic Moses releases, then they will set up SMT engines and develop user interfaces and tools for capturing feedback in order to improve them. They are also using and checking Apertium for certain language pairs.

DOWNLOADS:

Manuel Herranz Presentation