Toshiba and Pangeanic Steps to Machine Translation Hybridation – Article in AAMT

Pangeanic’s R&D team and Toshiba’s Knowledge Media Laboratory have published a joint article describing an initial hybrid pilot setting the basis for future work in the development of hybrid machine translation technologies from English into and out of Japanese. This article has been published in the December issue of the Asia-Pacific Association for Machine Translation (number 50).

A copy of the article is available for download from PangeaMT’s site.

The article was co-written mainly by Ms Elia Yuste, Mr Manuel Herranz as the initiator of the project and Mr Alexandre Helle as leader of the nipponization module at Pangeanic with Toshiba’s input coming from Hirokazu Suzuki from the Corporate Research & Development Center of Toshiba Corporation.

The article describes progress made using the statistical machine translation open-source platform Moses and Toshiba’s rule-based system to obtain better outputs. Future work points to Pangeanic’s syntax-based approach integrating English and Japanese within a self-sufficient, self-learning, re-training package part of the PangeaMT package as presented in Japan’s Translation Federation by Manuel Herranz in December 2011.

Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter

European Development Grants for Machine Translation Research

FEDER fund award to advance Automation Algorithms in re-training and self-learning process in Statistical Machine Translation

Pangeanic has been awarded European Union funds under the FEDER programme through Valencia’s local government IMPIVA in order to advance our Statistical Machine Translation technologies.

The award number is IMIDTA/2011/777. This is an extremely valuable award for Pangeanic as it enables the company to continue with its innovation efforts and improving its MT offerings.
This award corroborates Pangeanic’s long-term drive to implement, develop and offer customized translation automation solutions that accelerate and reduce multilingual translation costs.

Keywords: award, automation, algorithms, statistical machine translation

ESF fund award for R&D Statistical Machine Translation

Pangeanic has been awarded European Union funds under the ESF programme through Valencia’s local government IMPIVA for the R&D co-ordination work within the machine translation field carried out by Ms Elia Yuste, during 2011.

This is the second year running Pangeanic receives these Research and Development Award funds.

The award number is IMEXPB/2011/1.

Keywords: research award, statistical machine translation

Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter

Conference round: TAUS, LW Silicon Valley, Tekom and JTF

Q4 was very intensive at Pangeanic, attending 4 conferences to promote the benefits of the DIY SMT concept in the US, Europe and Japan.  We will now summarize the best of all these industry gatherings (TAUS and Localization World in Silicon Valley) and the cross-fertilization between them and the increasingly machine-translation-hungry language industry community, Tekom in Germany, and the 1-day event in Tokyo at the Japan Translation Federation.

TAUS – Silicon Valley

TAUS has become the de facto “think tank” and executive gathering for anybody with an interest in MT. Its conferences always provide deep and clear insights on ongoing work, the new and the trends in MT. Over 100 people were present in Silicon Valley, with a good mixture of researchers, vendors, practitioners and users. More than summarizing the event, I would encourage readers to watch the videos and judge for themselves: (click here) and of course to attend the venues.

Pangeanic was one of the founding members and has been a regular attendee and speaker since inception in 2004, so our recommendation to attend any sessions near you is well founded. The keynote was a brilliant address and summarization of the current status of the industry, with some hints from Bill Dolan from Microsoft which I find particularly relevant to future R&D in the field of a multilingual web. Do spend some time checking out the several initiatives which have built what Adobe calls a “harness” and others “a DIY” on Moses. It is rewarding to see so much interest in simplifying the complications of a Moses development and enhancing its limitations for company and commercial implementation. Certainly taking the complexities away and bringing MT to really user level is the future (as we believe in Pangeanic).

Localization World – Silicon Valley

It would be hard to overlook the comments made in other blogs about the event become “lighter” and with less content. The now (in)famous “Moses Madness and Dead Flowers”, which obtained more than inspiration from a Gartner’s report on the “hype cycle” (check similarities here even in the graph and pdf) described it as an almost empty event. Since time is one of the most precious things in life, I spent most of my time giving a hand at the Translators Without Borders booth, one of the humanitarian efforts Pangeanic supports, together with Medicins sans Frontièrs. It would be unfair to comment on the whole of the conference beyond the two talks I attended (obviously the MT round table, which was pretty general) and the SEO session, rather informative for beginners like me.

Localization World is increasingly becoming a gathering event where professionals and practitioners spend some days networking and pushing forward some of their ongoing projects. More user cases, independent of the theme of the conference are needed for attendees to relate to the experiences, learn and apply them.

Tekom

Ms Elia Yuste and Mr Andreas Thömel represented Pangeanic at the European Language Industry Association’s (ELIA) stand. Tekom differs from other events in the fact that translation and technologies are one of the domains within its overall focus on technical communication. Nevertheless, it was clear that a lot more people need machine translation than people are capable of customizing it. It is also clear that there is an ever increasing need to publish more multilingual content (and faster) than ever before. There are simply not enough (good) translators around.

It is not a simple LSP gathering but a place with more technology showcases. Interestingly enough, we heard comments from other MT vendors that “I am repeating the same stuff I’ve been talking about for the last 2 years”. There is a need for evangelization for MT adoption indeed, but surely there has been innovation beyond the “sell cheap machine-translated words”!!

Japan Translation Federation

Japan Translation Federation is the smaller conference out of the four events.  It is pretty much a domestic conference that attracts a lot of Japanese LSPs and technology vendors. I was very happy to see that at all starting sessions began with references to machine translation applications, in several shapes. Some presenters used the example of successes in Western languages was set as a guiding light for wider CAT use and MT in the Japanese localization industry. Only further research, funding and time will improve existing offerings into and out of Japanese.

I was invited to present  Pangeanic’s syntax-based hybrid at one the closing sessions to a very interested audience, which can be downloaded from http://t.co/7VFkbmuI.

There is plenty of news and comment which could have made it to this entry, from CSA research to merger and acquisitions. Undeniably, the market is ripe for further consolidation. This is something we will deal with in our next blog entries.

A very happy start to 2012 to all our readers!!

EPO & SIPO agree on Chinese-English machine translation for patents

Major breakthrough in enhancement of the global patent system

The European Patent Office and the State Intellectual Property Office of the People’s Republic of China have signed an agreement which will have a striking impact in the improvement of the global patent system and the dissemination of technological information in the world. This is an unprecedented move to eliminate linguistic barriers in public access to patent information. The agreement took place at their annual bilateral co-operation meeting held in Chongqing. This service will be free of charge and easily accessible through Internet.

Both patent offices agreed to work together to assure that by 2012,  automatic Chinese-English machine translation tools for patents will be available to the public. New possibilities are offered to innovators and users of the patent system with this agreement. It will provide access to a huge part of technological information which remains hidden nowadays due to language barriers.

“The agreement breaks new ground in the relationship between both regions in that it will bring the wealth of technology contained in patents to the fingertips of innovators on both sides, removing language as a delimiting factor,” said EPO President Benoît Battistelli (quoted from EPO’s official website). “The information function of patents cannot be rated high enough. Innovation is a global market, and by making their respective collections of patent documents accessible to researchers, scientists and inventors in Chinese and English, the EPO and the SIPO significantly contribute to strengthening the innovation process both in their regions and at worldwide level. Especially small and medium-sized enterprises, as well as research institutions, stand to benefit from this improved access to information on new technologies.”

The importance of patent information has grown significantly in the past years. With the advent of a true global technology market, the number of patent applications filed worldwide is growing annually and reached some 1,8 million filings in 2010, according to the World Intellectual Property Organization. Many of these applications originate from China and Europe, or take legal effect in these regions. Monitoring technical developments disclosed in patents is vital for innovating businesses in order to stay competitive.

Furthermore, rendering these documents accessible to the public for general information by offering automated on-the-fly translations can only help innovators to better adjust their R&D and also their investment strategies.

Public availability of documentation in Chinese and English will also enhance the efficiency of dissemination of information on new technologies disclosed in both regions. Furthermore, it will improve the quality of the patent granting process since Chinese prior art will be better considered globally.

Implementation of Machine Translation: Pangeanic case study at TMS Inspiration Days

Pangeanic has been invited as one of the 3 guest speakers at TMS Inspiration Days (19-20 April 2012) to showcase its transition from Language Service Provider to machine translation software application vendor.  Manuel Herranz’ talk will deal with the initial application of statistical models which were applied as a solution to increasing demand for automotive clients and how this internal project changed the company’s DNA. PangeaMT, Pangeanic’s feature-rich DIY SMT solution will be explained as a tool developed with the specific needs of the localization industry in mind. As language companies are freeing themselves from TM syndromes, they are embracing MT as a tool that offers unquestionable productivity enhancements and a competitive edge. Concepts like BLEU, Meteor will be demystified and we will learn how to scientifically measure MT output beyond sales talk.

In our mission to provide the tools to those who need them, we will analyse the positive effect of data-cleaning and data preparation (part of the new PangeaMT set) as well as what percentages mean in different languages and domains. Building from our efforts to provide technology and empower users, we will also touch upon concepts like bilingual and monolingual language models and their usefulness, the work behind hybridation and the freedom that PangeaMT provides with its auto-updating  features once the initial engines are set in motion.

Pangeanic’s DIY SMT solution has revolutionized the localization industry by providing a service that not only harnesses SMT tools like Moses, but that also offers system statistics, prepares data with cleaning tools, creates engines at will after Pangeanic’s first trainings and, above all, offers the freedom to retrain and update engines at will. Pangeanic also offers a SaaS model for those who do not want to host the solution in-house, but with very similar approaches to data cleaning, preparation, engine creation and re-training. Typical implementations at corporations (see joint presentation with Sybase, a SAP company).

TMS Inspiration Days is an international conference focusing on the business and technology aspects of the translation industry. This 3rd edition of the event will be held from 19th to 20th April 2012 in Krakow (Poland) under the banner of “Technology for business”. The conference agenda already includes other two presentations: “Keynote: Overview of translation technology” by CSA’s Ben Sargent and “Selling in America” by Renato Beninato.

The conference will begin with the “Keynote: Overview of translation technology” lecture by Ben Sargent from Common Sense Advisory. Ben has been involved in the translation industry since 1989. At CSA, he focuses on technology-related areas, particularly dealing with CMS and TMS tools. The lecture will focus on the major technological trends and solutions available on the translation market and their impact on the functioning of an enterprise.

 

Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter

Science to help (machine) translation: Using Linguistic Information for Hybrid MT Barcelona

by Manuel Herranz

One very important aspect that is often overlooked in the machine translation field and discussions is that machine translation is one variant of a more general science called pattern recognition and machine learning. MT marketing staff often overlook the rationale behind the maths they do not understand (I don’t claim to understand it all myself!!). On the other hand, linguists tend to concentrate on solving the impossible by overrating the importance of rules and linguistic data within MT systems.

Therefore, before offering some information about venues Pangeanic has been involved in and where its DIY SMT (or Machine Translation for the Masses as it has been called) has been present in one way or another, it would be useful to read an interview to Enrique Vidal from Valencia’s Polytechnic – one of the major figures in pattern detection and machine learning in the world. Sr Vidal recently received the National Prize for Computing 2011. The interview is in Spanish, but well worth reading if you want to put in context machine translation within the large science domains to which it belongs to.  MT is not just about selling output and of course not just about computing, but finding the (in)correct patterns (and then adding some specific features). Coincidentally, news about code-cracking helping to decipher a unstranslatable text (the Copiale Cipher) appeared days later in The New York Times.

No room to summarise our attendance to TAUS Silicon Valley and Localization World here. I will leave that for our next post, together with the upcoming exhibition at the Japan Translation Federation (JTF) Festival. However, I would like to point that some of the most valuable input I was able to gather about real advances on MT came from the academic gathering on International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011)  and the practical Saturday session ML4HMT (META NET WP2) in conjunction with DKFI. These sessions were not for the faint-hearted or commercially driven minds. They were really addressing those with a personal interest in making use of the best of research on MT to apply it to development. It was technical, and also trend-setting. As different research teams are facing the same problems worldwide, some similar, some new, some quite imaginative approaches are emerging all over the world, from Hong Kong to Spain, from the US to Norway. The advances published at the International Workshop on Using Linguistic Information for Hybrid MT will mark the agenda for features that sooner or later will be integrated in future MT offerings. For example:

  • Lemmatisation, annotation for morphologically-rich languages, for example Czech and Basque and even lesser resources in the case of the 2nd one.
  • Syntax-based approaches and word re-ordering for very unrelated languages (such as Asian or Semitic languages into and out of European languages)
  • Web-based annotation tools
  • Hybridisation of techniques, starting from analysis at a morphological layer, then analytical layers, tectogrammatical layers, and then transfer, and on to synthesis to t-layers, a-layer and m-layer.
  • Word disambiguation
  • Mixture of rule-based and statistical approaches to improve predictability.
  • Post-editing effort estimation for MT systems and with systems including no linguistic features or having some. Linguistic features are relevant for direct useful error detections and for automatic post-editing. But for sentence-level CE there are issues with sparsity and with representation (length bias).
  • New metrics like VERTa, using linguistic knowledge organised in different levels (lexical, morphological, syntactic information and sentence semantics)

A very intensive 4 weeks which included TAUS Santa Clara and Localization World Silicon Valley, and these science-driven MT workshops in Barcelona: three different venues to put to the market the best of research, to learn and to develop.

Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter

Pangeanic supports Translators Without Borders

Pangeanic, a leading provider of machine-translation technologies and multilingual communication services worldwide, announced its financial support via its silver sponsorship for Translators without Borders (TWB),  a non-profit organization dedicated to helping NGOs extend their humanitarian work by providing free, professional translations. The announcement became public in Localization World Barcelona and was re-affirmed with Pangeanic’s assistance at Translators without Borders’ booth at Localization World in Santa Clara.

Translators without Borders recently launched their online Translation Center, an automated platform through which NGOs approved by TWB can request translations, and TWB translators can accept and deliver the work on a pro bono basis.

TWB translates more than 2 million words a year for NGOs, and all work is carried out on a volunteer basis. Thanks to  the support of donors and sponsors, TWB plans before the end of the year to hire a program coordinator to lead the initiative to assist more NGOs in more languages.

Manuel Herranz, Pangeanic’s CEO stated:  ”Translators without Borders is filling a very important vaccuum in our industry, often seen as a low-profile, quick turn-around service industry. TBW can provide a crucial service by helping to eliminate language barriers in times of crisis so NGOs and aid groups can help as many people as possible. We at Pangeanic are thrilled and honored to be Silver Sponsors and support TWB’s expansion and look forward to offer our help with  our machine-translation technologies and as a language service provider.”

Translators without Borders Co-founder Lori Thicke added: “We are very excited having Pangeanic onboard as a corporate silver sponsor. In a crisis, aid groups often rely on translators in life-or-death situations and at times of dire need refugees from political or natural disasters end up living in a vacuum due to the lack of appropriate translators. Every effort commercial language providers can make to give back to society by providing something beyond their commercial services is a real help to communities at risk all over the world.”

About Translators without Borders
The mission of Translators without Borders (initially founded in France as Traducteurs sans frontieres) is to translate knowledge for humanity. Translators without Borders, a non-profit organization, has met that mission through quality humanitarian translations provided by a community of trained translators to vetted NGOs who focus on health, nutrition and education. On average, Translators without Borders volunteers translate millions of words per year, focusing on three types of humanitarian translations: crisis translations needed urgently to inform people in crisis, translations that support an NGO’s operations, and educational translations that directly support people in need. The non-profit organization’s vision is to expand its open digital platform to help make knowledge more accessible worldwide through humanitarian translations. The initial goal is to assist humanitarian organizations with two million dollars worth of free translations per year. For more information, please visit http://translatorswithoutborders.com.

Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter

Facebook to add automated translation services for posts

With over 750 million accounts, Facebook users span nearly every country in the world -and it has been ranked as the 7th most populated country in the world.  It was recently valued at 65,000 million US$. Not bad for a company that lets people chat and share pictures at its basic level, and that connects at its highest.

It sucess has brought shadows to other online companies such as Yahoo!, which had to fire its MD Carol Bartz by phone, as the company struggles to keep up with other online giants.

Facebook faces a common “problem” many large countries do:  multilinguism (if you consider that a problem) or rather the fact that it holds communities which do not interact with each other as there is a language barrier (that is a problem in the real world and in the digital world).

However, according to an Inside Facebook post on 2nd September, the social media site has started to experiment with an automated translation service to help bridge the communication gap between its communities.  Facebook  already crowdsourced the translation of its site to several languages, connecting millions of people to each other around the world in new and sometimes unexpected ways.

The new “Translate” button sits next to the “Like” button and apparently does a good job of translating not only standard words but also slang phrases. No details about the engine or technology behind the tool have been disclosed, although it is likely the company, giving its “crowdsource” philosophy, may have adopted open-source technologies rather than choosing to develop one from scratch.

The picture below (courtesy of Inside Facebook) shows a translation of the phrase, “Totally cool” from Hebrew to English.

As it happens with other life translation services, if the post has been translated, the button changes its status to “Original” and thus users can see the source text that was originally entered.

This machine translation functionality will undoubtedly be particularly useful for multinational organizations.  Their Facebook pages receive comments from all over the world, which are of course in different languages.  Gathering this wealth of real-life user feedback is the realm of some sentiment analysis firms.  Twitter and blogs are open-web resources for commercial firms to know what customers think and how they react to their products, services and events.  However, Facebook is a close-web environment which cannot be crawled, nor data mined so easily.  Therefore, the only solution has been for people to cut and paste the comments into a free web translation service like Firefox’s imtranslator or for the most sophisticated or corporate user which requires a private translation process build your own.  Facebook’s embedded machine translation feature would then be a timesaver not just for users, but also for commercial applications needing to know close-web or community-based opinions. Facebook’s translation currently only supports Spanish, French, Hebrew, Chinese, and English.

The feature is currently only available on Facebook Pages and not on profiles or apps. Nevertheless, if it works as Facebook plans, we can expect to see it rolled out for the entire site in the near future.

Audience Growth on Facebook: Top 25 Country Markets

All displayed data current as of July 1, 2011.  (Courtesy of Inside Facebook)

For Europe, no (new) CAT tool is good enough

by Manuel Herranz

And why should it be? Decisions coming from Brussels tend to be misunderstood, shallowly analyzed or directly criticized whichever way the wind blows. Let us remember 2010’s first ever report on the Size of the Translation Industry in Europe, which was a very comprehensive view of the current status, country by country, and facts and figures into several areas, even if revenues could only take into account certain activities. It also contained words and forecasts from personalities in the industry. Liking reports is like choosing a favourite colour – everyone has one liking. Nevertheless, it set detailed information where there was none.

However, the decision not to award the contract to any CAT tool in the market points to a very clear state-of-affairs in the language industry: despite massive innovations in computing (from open cloud to internal or managed clouds: Eucalyptus (built on Amazon EC2), OpenNebula, the solid Ubuntu Enterprise Cloud and the latest from what I envisage will be a winner OpenStack), the advent of SaaS models and even great advances in machine translation, no existing tool is exciting enough to justify a 5M€ expenditure of tax-payer’s money.

The story goes like this: the EU’s Directorate General for Translation (DGT) published a Call during early 2010 to substitute the existing CAT system (Trados 2007) with more modern technology. It is to be assumed that all the major players in the CAT market will have put in a tender according to specifications. The latter may be more or less to the bidders’ liking, but every administration, so long as it is the repository and granter of public funds, has to administer them wisely. Given the fact that there are some 4,600 staff translators working at the EU, and that the EU is by far the largest producer and consumer of translation services, the backing of one option over others would have set a massive market trend for the years to come.

I had the pleasure to share open-source MT solutions as an invited speaker last April in Brussels. I saw first-hand the internal drive to introduce Moses and Apertium as solutions which can set a minimum standard upon which to build a solution (or at least set a trend). I was particularly impressed by the work done internally by the Portuguese Department, which with minimum staff and resources was able to set up a small Moses-based solution that fitted their needs, giving preference to translation domains by choosing translation tables. They also did this following a TMX workflow and TM update with penalisation, which reflects our early stages in MT. I could only congratulate and praise their work. Other presentations from Dr Sharon O’Brien and Dr Andreas Eisele pointed out the need for translator acceptance being a key point, as productivity increases are nowadays beyond the question (whether they are 30%, 50% or 300% remains still the case study for in-domain machine-translation presentations). Progress done internally at the EU was presented and reported at TAUS Barcelona (see previous blog entry for a summary).

Going back to the decision not to choose to update from the existing CAT tool, the message is clear:

  • There is no justifiable leap in quality in existing CAT tools.
  • CAT leveraging, as a technique to make the most from previous translations, has reached its ceiling.
  • There is a lack of on-line help documentation in CAT tools
  • There is hardly any justification for more CAT tools unless they offer something truly revolutionary. (Now, there are several new tools which do make sense at LSP and corporate level, but not to the extent and cost the Directorate General for Translation required).

Some have seen a dark hand and there has been controversy – finally settled by the chairwoman of the committee explaining why there was no chosen one among the candidates. The explanation was made public in The Tool Kit. There was simply a lack of adherence to the requirements and no true innovation.

Personally, my favourite CAT tool has been Swordfish for a long time: it is nimble, agile, easy-to-use, built on and favouring open-standards, compatible with all major formats and it contains good QA features. The latest version even adds a powerful LAN TM collaborative option. Furthermore, its useful Goggle-Translate plug-in will probably be hit by Google’s decision to deprecate its free API, something that had been in the cards for some time. At a fraction of the cost of other tools, it gets the job done pretty well. Sadly, Swordfish is not based in Europe and most probably did not enter the tender.

To conclude, I am not only justifying the DGT’s decision not to award a 5M€ contract to any CAT tool provider (not even the latest versions of Trados in the shape of SDL 2009 made it). I am saying that it was the only likely outcome.

Why? Look around at the new machine-translation offerings and how these will become essential and perhaps every day life in a matter of years (check a possible future by Andrew Joscelyne where machine-translation engine creation becomes so easy and so common as to be the main work LSPs have to offer). Look at how the gathering of language resources is being automated by initiatives such as Panacea or the work done for lesser-resourced languages by Let’sMT and Tilde in particular.

Any doubts? Think about the advantages of an integrated, stable XLIFF workflow for documentation which not also leverages your existing content from a fast database system (not a “translation memory“), but also uses it to create your own MT eco-system in the background, growing with every new translated content you feed into it. Apologies for the DIY SMT self-promotion :) .

So, how long for really ground-breaking, open-source CAT+MT ecosystem tool?

Further reading:
  • Joanna Gough, A troubled relationship: the compatibility of CAT tools, http://www.translationautomation.com/technology/a-troubled-relationship-the-compatibility-of-cat-tools.html
  • Ben Sargent, The “Un-cancelling” of the EC Translation Technology Tender http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=1479&moduleId=390
  • Achim Ruopp, Will there be a thousand Moses MT systems? http://www.translationautomation.com/technology/will-there-be-a-thousand-moses-mt-systems.html
Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter


LocWorld & TAUS Barcelona 2011 – Interoperability and DIY SMT

This is a summary of what has been learnt, discussed and exchanged during our attendance to both TAUS and LocWorld venues in Barcelona. It highlights many of the issues that feature high not only in the translation industry but in the software industry in general.

LOCALIZATION WORLD

Pangeanic and their Machine Translation division PangeaMT were well represented at the Localization World edition in Barcelona in June’11. Manuel Herranz, CEO, who had spoken at TAUS Executive Forum (see ppt here) in Barcelona the week before was joined on this occasion by Elia Yuste (Business Development Manager and PangeaMT Lead) and Antonio Lagarda (PangeaMT R&D).

The announcement that Pangeanic’s PangeaMT package will provide a customized API for users to its different engines became a hot topic after the news that Google will deprecate its free version.

Toni and Manuel busy explaining the advantages of open-source, DIY SMT

PangeaMT’s  philosophy is rather different in coverage, as it develops domain-specific engines and thus its APIs can make calls to specialist engines (tourism, real estate, engineering, electronics, legal, even marketing). The PangeaMT booth was visited constantly by translation specialists, consultants and practitioners operating internationally and looking for customizable, scalable, open source MT solutions.

This was an extraordinary setting to discuss collaboration and representation avenues, sales opportunities and most importantly, demo the powerful PangeaMT DIY technology which enables users to build their own MT solution with their own data in-house or with a SaaS model.


Apart from exhibiting PangeaMT, also with a focus on the post-editing and other top-notch linguistic services that  Pangeanic may offer, Manuel  Herranz and

Elia (PangeaMT) and Fausto (Sony Europe) presenting results from a customized MT solution in marketing and professional electronics

Elia Yuste also took part in the  Conference program as  speakers at the E7 track hosted by Jaap van der  Meer (TAUS) and entitled  MT Experiences at IBM, Sony Europe and  Sybase. Fausto Prastaro (Sony UK) and Elia Yuste and Kerstin Bier (Sybase, a SAP company) and Manuel  Herranz presented their two  respective use cases – click here to view.

Pangeanic / PangeaMT were pleased to cooperate with the Localization World Organizing Committee and became Gala Dinner Sponsors, an event attended by over 400 international delegates of a total 550 attending this conference edition, the largest in number of participants and one of the most successful editions to date. 

Manuel, Elia and Toni at Pangeanic-sponsored dinner, Localization Word Barcelona 2011

Manuel, Elia and Toni at Pangeanic-sponsored dinner, Localization Word Barcelona 2011

TAUS

Interoperability the key point and the buzz word for the future if you are dealing with software, translation and more even if you are producing translation software (machine translation software in our case). In a nutshell:

  • It costs money even though most vendors and clients can’t quantify it.
  • Some are calling for organizations or leaders.
  • Leveraging of TM losses 20% when switching vendors because of different tools being used.
  • Some translators simply refuse using one tool or another. Some tools also make translators not-so-productive, thus pushing rates up.
  • The current mix of free, cloud-based, licensed, SaaS and LSP-hosted tools is too much of an offering as they do not talk to each other. Perhaps new models are required. Human translator’s life should be made easier rather than focusing on formatting handling.

Several industry players (Iris Orriss, from Microsoft, Karen Combe from PTC, Minette Normal from Autodesk and Eric Blassin from Lionbridge) stated that there are interoperability issues between software and documentation and that it costs money, that CMS doesn’t work well with TMS even with own same supplier. Suppliers can’t find ways to keep up and solve interoperability problems. Ideally, we should use UI with documentation but it is very difficult and there is no budget for it even though inconsistencies do occur.  LSP’s are charged with the burden of  costs and reduced potential of innovation and efficiency. You are at the mercy of the tools and sometimes it is trial and error to find the best solution. Lack of interoperability is frustrating.

At TAUS, Smith Yewell, CEO Welocalize commented that lack of interoperability is causing productivity to be impaired and profitability to be undermined. We calculated it costs us 3M $ passing formats up and down and manual work. Because formatting affects transfer of data from our repositories (TMS or databases in MySQL, Oracle) to translation environment (Trados), quality issues happen as the wrong translator can be matched to the wrong job. Personally, I would agree that is the case with most LSPs facing conversion problems from different publishing, web and other formats, from Indesign to flash, html and even doc/rtf. Having a single, across-the-industry format would level the playground and increase efficiency.

Interesting presentations came from the EU, which  is the largest buyer of translation services. They are embracing the Moses platform to solve part of their problems in over more than 400 language combinations. The task the face is massive, as they have to work with any kind of combination from straight-forward Romance languages and English to morphology-rich Eastern European languages, agglutinating Finnish and Hungarian… Daunting task, but there are also many EU-sponsored R&D programs which can feed back eventually and help the solution.  Spyrodon Pilos (EU’s DGT) stated that in 05/2010 Commission Task force was confirmed so that the need for MT is addressed. On 12/2010 ECMT service suspended (rule-based system, Systran), so the EU is looking at Open-source software. The new name is MT@EC and it has to be built on trust, confidentiality and continuity. The EU is building a data-driven system using all its internal TMs, cleaning and preparing them, filtering it and processing them for MT. Benchmarks are established internally with basic Moses releases, then they will set up SMT engines and develop user interfaces and tools for capturing feedback in order to improve them. Also using and checking Apertium.

[Welcome to our latest visitors from Venezuela, Panama, Croatia, Serbia in the last month!]

Next time you think languages, think Pangeanic

follow us on –>Follow manuelhrrnz on Twitter