Hybrid Machine Translation

NMT versus SMT results in Japanese

The Pangeanic neural translation project

The last few months have been extraordinarily busy at Pangeanic, with a focus on the application neural networks for machine translation (neural machine translation) with tests into 7 languages (Japanese, Russian, Portuguese, French, Italian, German, Spanish), the completion of a national R&D project (Cor technology as a platform for translation companies offering an integrated way of analyzing and managing website translation and document analysis), the integration of CAT-agnostic translation memory system ActivaTM into Cor and our neural engines, and the award by the European Union’s CEF (Connecting Europe Facility) of the largest digital infrastructure project to build secure connectors to commercial MT vendors and the EU’s own machine translation service (MT@EC) for public administrations across Europe. Leading machine translation developers such as KantanMT, Prompsit, Tilde and our PangeaMT join forces with consulting company Everis to build IADAATPA, a system that will intelligently work on domain adaptation and the selection of […]

Deep learning – The day language technologies became a Christmas present

It is said the third Monday every January is the saddest day in the year. It does not take deep learning to feel so. A long vacation period has ended. No sight of another one until several months away. Overspent, overstuffed, with no more presents to exchange, with winter settling in the Northern hemisphere and missing the drinks and chocolates that made our sugar levels go sky high, many start booking holidays in the sun. Let’s turn the clocks back to Christmas and we will remember the last few weeks as the Christmas when language technologies made it to the top of the list. Millions of people, literally, have opened boxes whose content was an electronic assistant with a rapidly improving ability to use human language. There are two main products: Amazon’s Echo, featuring the Alexa digital assistant, which sold more than 5m units. In essence, Echo is a desktop […]

Hybrid Machine Translation Use Case at TAUS Tokyo Forum

by Hirokazu Suzuki and Manuel Herranz The second TAUS Tokyo Forum was held on April 19th and 20th2012 at the Aoyama Centre, hosted by Oracle Japan. The Forum had to be cancelled last year as a result of the earthquake disaster which hit Japan on March 11th, 2011. Otherwise, the forum would have run its third edition this year. All of the participants from Japan rejoiced warm-heartedly to be able to take part in the forum again. The main topic of TAUS Executive Forum was use cases of MT technologies and innovative business models. Dieu Tran (Cisco) and Alan Chung (SDL), who received a TAUS award for the best use case this time, talked about their integrated MT/TM system that makes effective use of SMT. Suguru Sakanishi (Yaraku, Inc.) and Miori Sagara (Baobab) presented their collaborative translation platforms that combine MT technology and human resources. Crowdsourcing translation with MT technologies […]

Toshiba and Pangeanic Steps to Machine Translation Hybridation – Article in AAMT

Pangeanic’s R&D team and Toshiba’s Knowledge Media Laboratory have published a joint article describing an initial hybrid pilot setting the basis for future work in the development of hybrid machine translation technologies from English into and out of Japanese. This article has been published in the December issue of the Asia-Pacific Association for Machine Translation (number 50). A copy of the article is available for download from PangeaMT’s site. The article was co-written mainly by Ms Elia Yuste, Mr Manuel Herranz as the initiator of the project and Mr Alexandre Helle as leader of the nipponization module at Pangeanic with Toshiba’s input coming from Hirokazu Suzuki from the Corporate Research & Development Center of Toshiba Corporation. The article describes progress made using the statistical machine translation open-source platform Moses and Toshiba’s rule-based system to obtain better outputs. Future work points to Pangeanic’s syntax-based approach integrating English and Japanese within a […]

For the Advancement of Arabic/English Machine-Translation Technology (and others): IBM and KACST

The Saudi Arabian National Research and Development Organization announced yesterday a multi-year agreement to collaborate on, amongst other areas, the advancement of Machine-Translation technologies. Under terms of the agreement, King Abdulaziz City for Science and Technology (KACST) will purchase an IBM Blue Gene supercomputer that will enable its researchers to perform complex simulations and computational modelling. The software giant will provide training services to KACST researchers on the functionality and features of Statistical Machine Translation technology. IBM’s Research and Development team will be in charge of building the machine translation system with the initial basic system capabilities, which will be trained with several million words of data – the basis of the translation training process. IBM will commit researchers, and business consultants and KACST scientists will work together to further enhance the IBM Machine Translation Engine into a powerful translation engine to translate Arabic to other languages. This project deals […]

How to measure machine-translation quality

Many people have asked me how they can reliably use a system to measure/benchmark the quality of their translation system (rule-based, example-based or statistical). They have bought some commercial rule-based software and are trying it, building dictionaries and normalization rules or they are having a first try at what it means to deal with a Moses engine. There are two free systems which can be used as input/output and that will give you an idea of how your system is scoring. Some people use them to test their system versus Google Translator, raw MT output or other texts. You can use it, for example, to check how your system is doing in comparison with free GT, Systran online tools, BabelFish, etc. It may give you an idea of your progress as you customize your own tool for a particular application, taking generalist online tools as a basic reference. The tests […]