digital business

Understanding Machine Translation Customization and DIY MT

by Manuel Herranz The same mistake that was made by many translation agencies, translation companies and now language service providers is being made by tough machine translation companies. “My (machine) translation is better than yours”, “my machine translation system works, everybody else’s doesn’t”. Translation companies have learnt that they cannot sell translation services on “translation quality claims” only or “I am better than you because…”  – but it seems that some machine translation companies have to learn the same lesson. I am referring particularly to those with risky levels of investment /venture capital to repay and without the testing ground of in-house native speakers or a real translation department where to test their technologies and MT before release. At times, such companies obtained their “high quality clean data” by bombarding Google Translate and applying cleaning cycles which included manual revision by local, non-native graduates. Many LSPs fall for the big marketing campaigns, […]

Human Translation or Machine Translation – What’s Best for Me?

For some people, using a translation software program to translate a piece of text from one language to the next is enough. They believe the current state-of-the-art is good enough to provide Human Translation or Machine Translation. It would be naive to believe this always works. We have proven at Pangeanic that this works in applied contexts, when we are dealing with a particular domain, enough clean data and when certain conditions apply. Please refer to many of our presentations since 2009 on the use of applied machine translation to speed translation of documentation in particular. But as we all know, it takes a lot more than just software. The application of unrestricted, universal machine translation will take some time. In fact, it would not be fair to talk about “machine translation” in general but language combinations (English/Spanish/French/Portuguese/Scandinavian) in which it is undoubtedly successful -whilst in other languages certain nuances make […]

Translation Technologies at LocWorld (Part 2: Practitioners)

by Manuel Herranz I will describe the rest of the very interesting Pre-conference Day and presentations by the organizers (TAUS) as well as other 3 companies which are either machine translation developers or practitioners of automated translation solutions. Presentations brought different perspectives to the machine translation landscape, with efforts and advancements by several companies, including Pangeanic. Maxim Khalilov from TAUS summarized the good work being done by the organization within the MosesCore project. His presentation was an invitation to visit and find out more about tools, data and resources. Amongst the tools, he mentioned other alternatives to Moses (Thot, for instance), as well as a collection of TAUS features on quality evaluation. Important for new entrants or those with an interest in MT is the collection of data at TAUS: Europarl (1,8 million sentences), JRC-Acquis (270 paragraphs), Hansards, UN, OPUS, LDC Linguistic Data Consortium, ELRA, as well as TAUS’ own repositories, […]

Multilingual web is more than translation (1/2)

by Manuel Herranz It is beyond doubt that the web has become a multilingual. The work, experiences and cross-pollination with other disciplines, from machine translation to localization and semantics, were shared at EU-sponsored Multilingual Web event which took place in Rome during 12-13th March 2013. Whilst technologies such as machine translation are already well-integrated for fast web page translation, it was reassuring to see that even large web actors, such as Google consider there is plenty of work to do in making the web truly multilingual. The release of ITS 2 and the new features and possibilities that html5 opens made the venue a meeting point for professionals, practitioners and academics dealing with the semantic web, translation, applied machine translation and CMS tool providers. Google’s experiences were shared by Mark Davis and Vladimir Weinstein and pinpointed translation and localization issues which are often overseen. We already assume that a page can […]

logo of association for machine translation in the Americas

10th Biennial Conference of the Association for Machine Translation in the Americas

AMTA-2012, the 10th biennial conference of the Association for Machine Translation in the Americas, will be held at the Catamaran Resort Hotel in San Diego (California) from Sunday, October 28 – Thursday, 1st November. As in previous editions, AMTA will take place right after the Annual Conference of the American Translators Association (ATA), which is holding its 53rd edition from October 24th-27th, also taking place in San Diego. Both conferences are coordinating program content around joint topics of interest. Conference content has been designed as a cross-fertilization between researchers’ lines of work developers’ of machine translation products and services the understanding by both of the needs of the translation industry and human translators whilst fostering the understanding of modern machine translation technology and the role of advanced translation automation in enterprise globalization and commercial translation processes by the ultimate practitioners of the technology, the human translators – upon whose growing […]

Moses is not the new Messiah

by Manuel Herranz If you run a translation company or translation department or have some sort of connection with the translation industry, you have noticed without a doubt that MT (or automatic translation) is the flavor of the year in 2010… and will be for many years to come. It has and will change the way do things in this industry.  Several factors have been an unstoppable increase in the globalization of services and support, smaller budgets from buyers, an increase in international trading of services and the need for more content and more multilingual content in more languages. As of May 2009, there were 487 billion gigabytes of data which were increasing 50% a year (Oracle) or doubling every 11 hours (IBM). There are both exogenous and endogenous factors for things to reach maturity level now and not earlier or later. Among the latter factors we may include the fact that the bases did already […]

Breaking News – a few days later: “Aha, SDL has bought Language Weaver! So what?”

by Elia Yuste & Manuel Herranz SDL announced the purchase of Language Weaver (LW) on Thursday 15th July 2010. General media as well as GILT industry experts have spread the news in just no time in the form of newsletter updates, opinion articles in professional networks, blog entries or tweets. The news coverage has been phenomenal. Everyone has been asking about the why’s, the how’s and the thereafter’s of this financial move by SDL. We coincide with some leading industry analysts in that perhaps Language Weaver’s future will not be as rosy as it may seem. Perhaps it will follow the same destiny as Idiom once did, from market establishment to product support discontinuation following its acquisition by big buying father SDL. As an independent LSP offering its own customized MT technology, Pangeanic is in a privileged position to offer dedicated client-focused solutions, based on innovative MT and, if required, post-editing […]

IBM + Lionbridge MT agreement – What does it mean for the industry?

by Manuel Herranz The news of the month has undoubtedly been the announcement by Lionbridge to partner with IBM to develop (and probably offer) machine translation solutions. Possibly, the intention is to offer the advantages of MT to Lionbridge existing clients and maybe to control the technology. After all, whoever controls the technology, has a good chance of gaining (or consolidating) market dominance. The move must be welcome by all true believers of MT as the (new?) force of change in the translation industry. However, even though Lionbridge is the biggest language company jumping in the MT-DIY boat, it is not the first one to combine the offer of MT+PE as a substitute to the (increasingly old-fashioned?) TM or T+E+P models. With translator production reaching 850-1000+ words per hour and mounting production and  price pressures, 20th-century technologies seem too cumbersome and antiquated for the demands of multilingual digital content. Who […]

BBC debate demonstrates power of real time machine translation

MT is in the news. On 4th March, The Economist published a review of what the web might feel like without linguistic barriers. “Cyber-multilinguism” is increasingly a reflection of the world we live in. A few days later (9th March) The New York Times run a comparison of machine translated texts from French, Spanish, Russian, German and Arabic into English and the quality obtained from several engines. The texts dealt with works of literature, which may well have been in the training material of the engines, but it also dealt with current affairs and news clips. Google presented a cell phone link to MT. Last week, the BBC conducted a new experiment to test how people and the media can make use and really broaden the horizons by using MT, even if only a in a very generalist way. MT users seemed happy with 80%-90% accuracy. A couple of findings I thought were […]

Translation guys should not miss Google’s predictions

by Manuel Herranz This will not be a long article or comment, as the source speaks for itself. There is very interesting food for thought in the line of this month’s Pangeanic’s blog posts. The article is a 3-page conversation with Senior Exec Alan Eustace on Innovation Strategy and the Technology. You need to read all 3 pages (slowly and digest them) in order to get the full picture. I quote the most interesting bit for translation professionals (there are more interesting quotes) “Machine translation will become ubiquitous and as good as human translation, so the language barrier will be gone. All mobile devices will have speech input. Having all local information—maps, directions, and so forth—will be commonplace.” Now, how’s that for clarity and committed statements? What are the implications for translation companies and the whole of the Translation Memory-dependent industry? (If you need the source, it is page 3 following […]