PangeaMT Masker

2 min read


Speech recognition on smartphones

by Manuel Herranz February's event has been Barcelona's Mobile World Congress, but not only for those interested in mobile/cell handsets as a means of communication. The accent has been on the convergence of technology. Whereas pixels and cameras, email or videoconference dominated some years ago, and social networking not so long ago, 2010 has proved that cloud computing, harnessing vast server resources may be made available in your pocket soon. Translation is bound to be just one of those services.

Shouldn't our mission become "to help the world communicate faster"?

Open Source programs like Moses have made it possible to democratize access to machine-translation technology and open the playing field. More importantly, Moses has made MT customizable to a very large degree, so one can build all-in " hybrid" systems without the need to resort to nice-sounding cocktails of rule-based and statistical machine translation systems. In reality, there is little need to integrate RB to create a hybrid beyond acting as a "controlled language pre-processor". Anyone who becomes half familiar with the way SMT works and Moses pre- and post-processing capabilities soon realizes the oxymoron. Beware of those systems claiming to offer the strengths of both "rule-based and statistical systems". You are better off investing your time and money into combination hypothesis (more news on that in this blog during 2010). The possibility of using immediate, on-demand translation is not certainly new and has featured on this blog several times (see  Toshiba announcement on December 2009 offering a translation system linking cell phones to a remote server where translation is computed and delivered to the handset and NEC's glasses offering subtitles). There is a whole new avenue (i.e market) for translation technologies if we stop and think about the need for immediate translation. Traditionally, our industry has concentrated in solving the need of document translation in its mission to help the world communicate. Now, ten years in the 2000's ...shouldn't our mission become "to help the world communicate faster"? On-demand, mobile translation powered with cloud computing creates a new true hybrid job in the shape of interpreter-cum-portable multilingual translator. Almost immediately, questions are raised about the possibility of further integration, with existing speech recognition technologies. I have seen some prototypes working at technology centers for the past 2 years within given domains (tourism is a good application, for example, in which one could pick up the phone from a hotel bedroom and order a taxi or some pizza in one language, or even check in).  Technologically speaking, the future speech-to-speech multilingual translator can be assembled fairly quickly even with today's technology, but its usefulness as an "all rounder" is not realistic [yet].  Massive amounts of data are required for a general translator, and accuracy will always be an issue if any of the speakers starts deviating or mixing several fields into one. This is the case often with humans, who may start a conversation about the legalities of a deal which then develops into a series of comments about accounts and financial considerations/implications related to any field from fallen golf stars, real estate, civil engineering, fishing rights or car insurance. Thus, a domain-strong translation system needs to be a first priority as speech recognition systems have already understood, with modules serving the legal, medical or engineering professions. A good review with some insights in today's San Francisco Chronicle "Smart phones may one day be translators".
Next time you think languages, think Pangeanic Translation Services, Translation Technologies, Machine Translation