Google Translate errors have become a kind of popular culture joke (see our previous entry about the "clitoris festival" in a Spanish town). Some years ago, it proved unable to translate President Obama and it kept naming him "Bush". This was due, at the time, to the larger amount of data linking the words "President" and "Bush". Now, Google provides another interesting translation error. Google Translate is one of the most widely used translation tools worldwide. We have to say that Google has had the good taste to offer its product in mobile applications, and even applications for smartwatches in addition to the web version. It is extremely convenient to use and it has become a reference tool for translation professionals and users of translation services at least to discern or understand texts in foreign languages. And let's be fair, Google Translate's machine translations are often of good quality considering it is a very general software, an "non-customized machine translation engine" (read more about how machine translation works here). Because of its ubiquity and availability, it is a product that is hard to beat. But voila, Google translate is an automatic tool, and is likely to produce errors. It relies on thousands of alingned, bilingual documents. So, some mistranslations can occur... but some are a little more odd than others. So, according to BBC reports, the tool recently found no better match for the country name "Russia" than "Mordor" (or Black Land in J. R. R. Tolkien's "The Lord of Rings" fictional universe, the region occupied and controlled by Sauron). To continue with the story, "Russians" became "occupiers". According to Google translate, Mordor is real and is none other than Russia. Well, Russians may have bad press, but for Google to say this...
So how can these machine translation errors happen?
First of all, one needs to remember that these errors are not produced by any particular language. In reality, they are the result of a translation of the Ukrainian language into the Russian language. The search giant does not rely on simple dictionaries, but on huge databases, and many entries are web pages indexed by its own search engine. Therefore, Google Translate ultimately depends on data provided and created by us, the general public. The tool analyzes millions of documents and matches certain words, certain habits of language. Statistically, it finds equivalents between languages because after a certain number of occurrences in a given language, it infers that a certain word happens in a certain way. This approach works better in some languages than others, but combined with other techniques, it can provide fairly decent results. To translate Russia as Mordor can then be explained like this:
- It appears that these terms have been widely used by Ukrainian soldiers after the invasion by Russia of the Eastern Ukrainian regions. The have created documents or texts that Google has been able to find.
- Google has aligned these texts, pun included.
- Statistically speaking, there was enough or even more data at one point equating the word "Russia" to "Mordor", just as it happened with "occupiers" (the coverage by the Ukrainian and Russian media and TV was very one-sided in both cases).
To avoid the wrath of the Russia, Google has of course responded in a statement, specifying that its automated tool was not always perfect and that errors could occur. The research giant then assured that it would continue to do its best to improve Google translation and in the future to avoid such errors. The point remains that to rely on millions of documents instead of a simple dictionary to translate terms like "Russia" is a little bit unnecessary, but that it is another story.