TAUS

Maxim Khalikov from Booking.com

Some takeaways from TAUS Summit Portland

TAUS Yearly Summit in Portland was a great event and the largest I have attended so far (and I have been a regular attendee since 2007 in Brussels). The organization has definitely grown from being considered a think-tank to promote the exchange of data for the benefit of automatic translation engine training, to develop useful tools for the industry. There were times when only experts and a few EU officials or managers from large corporations attended. The mixture in the audience and the quality of the keynotes prove that TAUS has grown as a major reference conference for decision makers and translation technology implementers in the language industry far away from service LSP’s conferences. We are going to be postediting and leaving the TM syndrome behind. Translators will need to face the reality and the realm is post-editing – Tony O’Dowd, CEO, KantanMT. Unfortunately, I missed the first day of […]

Evolution of the language technology landscape – TAUS Tokyo

by Manuel Herranz I attended the last TAUS meeting in Tokyo. This organization has come a long way in promoting machine translation among translation professionals, primarily translation buyers. Corporations like Microsoft, Adobe, Dell, eBay, etc., donated large bilingual data sets which allowed companies to improve the stage of machine translation, to run hundreds of tests with Moses in order to improve accuracy and find better ways in which to make machine translation a reality we find embedded and we take for granted in so many products. Pangeanic’s drive to create and develop innovative language solutions for its clients led us to create a new section called PangeaMT, which was the first one to use Moses in a commercial setting back in 2009 and served its clients with language automation. Nowadays, it seems that widespread adoption in the wake of solutions provided by non-industry giants like Google and Microsoft have created […]

Hybrid Machine Translation Use Case at TAUS Tokyo Forum

by Hirokazu Suzuki and Manuel Herranz The second TAUS Tokyo Forum was held on April 19th and 20th2012 at the Aoyama Centre, hosted by Oracle Japan. The Forum had to be cancelled last year as a result of the earthquake disaster which hit Japan on March 11th, 2011. Otherwise, the forum would have run its third edition this year. All of the participants from Japan rejoiced warm-heartedly to be able to take part in the forum again. The main topic of TAUS Executive Forum was use cases of MT technologies and innovative business models. Dieu Tran (Cisco) and Alan Chung (SDL), who received a TAUS award for the best use case this time, talked about their integrated MT/TM system that makes effective use of SMT. Suguru Sakanishi (Yaraku, Inc.) and Miori Sagara (Baobab) presented their collaborative translation platforms that combine MT technology and human resources. Crowdsourcing translation with MT technologies […]

Pangeanic’s participation in TAUS Copenhagen 2010

by Elia Yuste TAUS has been tracking the exciting experiences of companies pioneering in a radical new MT engine training space for the last year or so. Pangeanic is one of the most outstanding cases, and so we were advertised as the first LSP to create a new business stream with TAUS Data Association (TDA) data earlier on this year. Then, PangeaMT, Pangeanic´s technological division geared at customized MT solutions and consulting, was invited to take part in the proof-of-concept of TAUS MT Trainer and present its results on the occasion of the TAUS Executive Forum in Copenhagen in late May 2010. The idea behind this MT Trainer, a web-based facility from TAUS TDA that will materialize within the current year, is twofold: first, to foster pro-active adoption of TDA data for MT engine training; and second, to connect MT service commissioners and providers under the TAUS umbrella, whereby the former may submit their data files (reference files for engine training […]

Comment to SDL’s “Sharing Data between Companies – is it the Holy Grail?”

by Manuel Herranz Eye openers about data sharing (or data mixing) abound nowadays. The kick start for TM leveraging, automation and faster solutions has come from outside our beloved language industry in the shape of – algorithms that create language (SMT) and their application/business by players inside and outside the industry (from Google Translate to new MT entrants and offsprings) – a credit crunch and a financial crisis that is leading companies to rethink the unthinkable A few times (exceptions) language professionals have joined to actually innovate and come up with something really new, mostly crowdsourcing, in translation, in frameworks, in workflows. Never mind, it is seldom the norm that busy people have the time to innovate. It takes a shot from outside a particular industry to shake the foundations or to force to change things. (Let’s assume from a positivist point of view that change is for the better). […]

Improving the quality of a customized SMT system using shared training data

At the MT Summit in Ottawa (August 28, 2009), Microsoft’s Chris Wendt presented the findings from a recent pilot project using translation memories from more than ten TDA members to train the Microsoft statistical machine translation engine. Main tests were performed in two languages: Chinese and German, with customization for Sybase iAnywhere. Additional tests also were run on Polish and Japanese languages with customization for Adobe and Dell. BLEU scores went up significantly with increases between 22% and 74% compared to engines trained purely on Microsoft or general available data. These tests point to better quality results and improvement in a system’s performance by adding more parallel data from other organizations – in this case shared data through the TAUS platform. This is a link to this seminal presentation Improving the quality of a customized SMT system using shared training data View more presentations from TAUS. Next time you think […]

Technical guide to SMT Training Data

A new guide for anyone interested in working with MT and using his/her own data to create machine translation engines has been published by TAUS in its website. The technical guide  to SMT Training Data is intended for users and any organization keen to train engines with its own data.  It deals with the preparation of translation training data for statistical machine translation. It examines the processes for data preparation (typically bilingual TMX) which are the catalysts to enable both data and algorithms to work together. TAUS’ report by Tom Hoar also explores how to define an organization’s training data strategy to match overall system design, identifying potential data sources for bilingual, well-aligned TMX. It also talks about the challenges faced when merging corpora from multiple sources to create large but stable data sets, exploring several methods to prepare translation memories from several sources into Statistical Machine Translation training data. Finally, it looks into the speech roots of SMT […]

Pangeanic commits to Data-Sharing Platform

After three-year long discussions among corporations, language vendors and institutions, Pangeanic commits to Data-Sharing Platform in a multinational effort to share translation databases among some of the most important corporations. Pangeanic will be a contribuitor to the project with TMs provided by their clients and its own TMs which are also being used for its own Moses-based machine-translation solution into French, Spanish, Italian and German. It is envisaged that through Pangeanic’s membership, the customized engines will be able to achieve higher levels of accuracy after re-training.   Next time you think languages, think Pangeanic Your Machine Translation Customization Solutions