I attended Localization World London both as a guest speaker on what I call an upsurge in machine translation, almost a “transition frenzy” towards post-editing “future stability” within the EU-sponsored MosesCore project, organized by TAUS, and as an exhibitor of PangeaMT’s DIY SMT machine translation technologies. The session formed part of the Pre-Conference Day and it was a lively session with plenty of Q&A from attendees, reflecting the high interest MT has stirred among translation users and practitioners nowadays.
Prof. Hieu Hoang provided a general introduction to what an SMT system is as translation technology, as well as what translation and language models are. The distinction between a translation model and, probabilities of phrases to figure out how the output sentence is grammatically correct, proper re-ordered, etc. Prof. Hoang related the story of how he originally updated Pharaoh to replace Moses and now only maintains it, as it has become a community-based project in which he stopped being the largest contributor long ago. He cleared some misconceptions about Moses:
-
“Only runs on linux”: NO. Mostly, but also in Windows 7 (32-bit) with Cygwin 6.1; Max OSX 10.7 with MacPorts; Ubuntu 12.10, 32 and 64 bit, Debian, Fedora, OpenSuse.
-
“Difficult to use”: it is now easier to compile and install with Boost bjam so no installation is required. Binaries are available for Linux, Mac, Windows Cygwin. As it did in the past, it contains ready-made models trained on Europarl.
-
“Unreliable”: Absolutely not! The community monitors check-ins, there are more and more regression tests, nightly tests running end-to-end training (see statmt.org/moses/cruise). Moses has been tested on all major OSes. It offers models already trained on Europarl models in 8 languages and they work.
-
“Only phrase-based”: NO. From the beginning, it is an extension not a replacement of Pharaoh.
- It is not developed by 1 person but by a community!
-
Some people claim it is slow. Really? It is fast enough and surpasses previous engines, we are talking about milliseconds to produce translations. Actually, thanks to Ken Heathfield, it is multithreading and we’ve added reduced disk I/O and disk space requirement.