Try our custom LLM Masker
Featured Image

3 min read

05/05/2011

Multilingual Web & Translator MT Awareness

April was a busy month within the Northern-hemisphere, conference-rich season. Gala took place in Lisbon and, although Pangeanic - PangeaMT could not participate, there were plenty of other specialist venues to choose from.

Multilingual Web took place in Pisa during 4th and 5th April. The venue was true to its premise to discuss standards and best practices for the Multilingual Web and gathered a good, specialist and multi-disciplined crowd. The mixture was well planned: the sessions were standard by a keynote address by Oreste Signore and Kimmo Rossi, as well as a report from Ralf Steinberger about multimedia news reports service from the JRC (the organisation has made all presentations and videos available, so the blog entry was worth the waiting). You can check all presentations here.

Not being an expert on the creation or development of the web, but being a keen open-standards supporter, I found it very enlightening to be a witness to the discussions, as I am sure the localizers session (mostly dealing with interoperability problems faced at localization time) was to creators and developers.

Enlightening was indeed to hear Richard Ishida talk about HTML5 and all the novelties it will bring about. Richard is the head of the W3C Internationalization Department, and possibly the only one at the conference truly living between both worlds. The exchange of ideas and experiences was extremely useful: there were many thinking heads from many different areas.

SEO was an area a learnt a lot about by the time I spent with Gustavo Lucardi, from Trusted Translations. His blog-report entry on events on the second day should not be missed by fellow translation professionals. Trusted Translations has set new standards for the rest of us on SEO and Multilingual SEO.

On the first day, David Filip (now at Limerick) provided some food for thought when he said that TMX was dead, now that Lisa is no more, and that XLIFF is the future (a more standard and better interchangeable XLIFF 2.0, let's say). I agree XLIFF is the foreseeable and recommended interchange format and a lot more information and data could be exchanged if it was properly promoted (I wish it was) and not become "some kind of property of" like SDL's support by creating its own version (sdlxliff).

However, TMX 1.4b has become the de facto exchange format and it may take some time before we see the back of it, just as MP3 did not killed the CD simply because there are so many things which are compatible with it, from PC's CD-ROM to in-car CD readers (my reply). Finding out new initiatives like InterOperability Now! (Sven C. Andrä) and M4Loc was very refreshing for us at PangeaMT, as we base our offering in open-standards to ease free information exchange. They show an increasing interest to develop tools and standards that ease the rapid exchange of information, to which machine-translation cannot but contribute.

Finally, a word about a busy trip which took me straight to Brussels from Pisa to participate in a video-linked forum (Luxembourg and Strasbourg) aimed at raising translator awareness and acceptance of machine translation at European Institutions. Translators' fears and mistrust about MT are almost universal  (check a recent tirade at Proz about introducing a new price scheme among freelancers  from a company which has introduced machine translation purely as a "price down" strategy).  If LSPs like to talk about "working together" and "partnering" with their clients long-term, they need to think how to work better with their translators.

Many information sessions/seminars are still required at institutions and organizations and not just at management level -although Google Translator's ubiquity has done a lot of preparation work for what it is to come in the next few years. At least machine translation is not a black box any more. Translators, as practitioners, not only need good MT output, but also an understanding of what is behind it, how they can make a difference and above all a well-planned workflow. 

Post-editing is fun and those who get used to it never go back to a TM-based workflow. A lot of evangelism is still required during this transitional period and those closest to the output need to feel in control of things and see for themselves that they can influence MT with their feedback and are more necessary than ever. It was good to see first-hand that the EU has embraced open-source Moses (and Apertium) and has a whole team now dedicated to develop its own solution.

With over more than 420  language combinations into and out of complex and challenging languages (from Semitic Maltese to Romance, Germanic, non-related Baltic, and even non-Indo-European Finnish and Hungarian) often under-resourced if it was not for the EU as a producer, this is a task worth keeping a close eye to, particularly after the EU's parting from Systran.

I should particularly praise the Portuguese presentation there, by Hilàrio Fontes and the lessons learnt. The Portuguese Department at the EU has been a leader in MT adoption within the EU and has been able to develop its own Moses customization with limited resources and a lot of hard work and goodwill. I found many things similar to our own development and how existing tools can be used to integrate MT within existing production environments. The organizers in Multilingual Web, Pisa, via Richard Ishida,  who I have to thank for the good co-ordination work and the co-hosters Istituto di Informatica e Telematica and Istituto di Linguistica Computazionale, Consiglio Nazionale delle Ricerche have kindly made one of my presentations available (below).

Open Standards in Machine Translation