Try ECO LLM Try ECO Translate

2 min read

20/03/2010

Microsoft Translator v2 & what's in it for them all

As spring approaches and we all look forward to better weathers after a long, hard-work winter, we received some news about an interesting effort by Microsoft to go "collaborative" with the release of their Microsoft Translator v2 (see our review of v1 in November here).

Microsoft has started 2010 by announcing the availability of a “Collaborative Translations Framework”  for Microsoft Translator. Following other initiatives like Google's Translator's ToolKit and commercial ventures by large LSPs, Microsoft seems to be following a fashion by providing a platform developers can add the tool to make sites multilingual and also add translations online.

How this "on the cloud" information is going to be used remains a close question, just as in Google's [and other's] case. Presumably, it will help to improve the quality and availability of "trustable data". The platform, Microsoft claims, combines the scale and speed of automatic machine translation with the accuracy and context awareness of human translation. The Widget has been designed "for anyone with a web page".

In addition to the collaborative features, version 2 includes a batch interface to translate large amounts of data, support for communicating with the service securely via SSL, and a “Translate-and-Speak” feature (text-to-speech functionality). The translation APIs are available at no cost to developers and partners in SOAP, HTTP, and AJAX flavors so that developers can choose the one that best fits their requirements. All you need to get started is a Bing Developer AppID.

Microsoft  has announced the translation API (v2) that bring real-time, in-place translations to your web site, with collaborative features to help tailor the translations delivered to fit your site. Microsoft is still planning on polishing the translation widget, the toolbar, and alternative User Interfaces as well as analytics for the site and application owners based on user feedback. Admitting the service is open to improvement, more customizability has been promised, such as limiting the number of languages site owners can show as part of the widget.

In addition, Microsoft is working on making the Silverlight translator control available as part of the Silverlight toolkit release that will ship when Silverlight 4 goes final. It also includes a “batch” interface to translate large amounts of data, support for communicating with the service securely via SSL and the addition of “Translate-and-Speak”, a kind of text-to-speech functionality. Microsoft also provides an API for web developers to call the translation service via SOAP, HTTP and AJAX for their application. It has also added an enhancement to Bing Translator's user site, where you can make use of the “Translate-and-Speak” functionality whenever you translate into one of the languages supported by the service.

If you want to have go and try the accuracy of the MT engines and the interface, you can request for an invite (you will need a Windows Live ID).

Developers of MT systems (and particularly statistical or combinations of statistical with other techniques as we do at PangeaMT) have long known that data trustworthiness was an important issue.

For a long time, availability was the issue. With the advent of the Internet, multilingual web-crawling, donating institutions,  etc., MT engine training has put data availability behind as an issue. Even though massive amounts of data may be worth the effort for generalist-type engines, most companies and institutions in need of MT will not need have the resources, time or manpower to gather all the data they could. Furthermore, there is an open question as to whether massive amounts of data do actually have an impact on the "quality" of the engine, or at least on final results prior to post-editing if your organization is dealing with a specific language domain.

Thus, Microsoft may be taking the right steps to gather noise-free, trustworthy data from human feedback, just like you do with Google ToolKit, in order to feed its own engines. How efficient this can prove when compared to custom-built, tailored engines, remains to be seen.

Microsoft is not competing with the lower-end translation agencies but proving there is a market for "free translation" is MT is good enough.  Perhaps this is a way to also lure companies into its language project and provide more data? [Official Source Posting from MS]

Next time you think languages, think Pangeanic