Author Archives: Pangeanic

NMT versus SMT results in Japanese

The Pangeanic neural translation project

The last few months have been extraordinarily busy at Pangeanic, with a focus on the application neural networks for machine translation (neural machine translation) with tests into 7 languages (Japanese, Russian, Portuguese, French, Italian, German, Spanish), the completion of a national R&D project (Cor technology as a platform for translation companies offering an integrated way of analyzing and managing website translation and document analysis), the integration of CAT-agnostic translation memory system ActivaTM into Cor and our neural engines, and the award by the European Union’s CEF (Connecting Europe Facility) of the largest digital infrastructure project to build secure connectors to commercial MT vendors and the EU’s own machine translation service (MT@EC) for public administrations across Europe. Leading machine translation developers such as KantanMT, Prompsit, Tilde and our PangeaMT join forces with consulting company Everis to build IADAATPA, a system that will intelligently work on domain adaptation and the selection of the most appropriate engines through secure connectors for Public Administrations in the EU.

So, time to recap and describe our experience with neural machine translation and how Pangeanic has decided to shift all its efforts into neural networks and leave the statistical approach as a support technology for hybridization.

The Pangeanic neural translation project

We selected training sets from our SMT engines as clean data to train the same engines with the same data and run parallel human evaluation between the output of each system (existing statistical machine translation engines) and the new engines produced by neural systems. We are aware that if data cleaning was very important in a statistical system, it is even more so with neural networks. We could not add additional material because we wanted to be certain that we were comparing exactly the same data but trained with two different approaches.

A small percentage of bad or dirty data can have a detrimental effect on SMT systems, but if it is small enough, statistics will take care of it and won’t let it feed through the system (although it can also have a far worse side effect, which is lowering statistics all over certain n-grams).

Visual sample of statistical candidates with best candidate proposed in a statistical machine translation system

Visual sample of statistical candidates with best candidate proposed in a statistical machine translation system

We selected the same training data for languages which we knew were performing very well in SMT (French, Spanish, Portuguese) as well as those that have been known to researchers and practitioners as “the hard lot”: Russian as the example of a very rich morphologically language and Japanese as a language with a radically different grammatical structure where re-ordering (that’s what hybrid systems have done) has proven to be the only way to improve.

Japanese neural translation tests

Let’s concentrate first with the neural translation results in Japanese as they represent the quantum leap in machine translation we all have been waiting for. These results were presented at TAUS Tokyo last April. (See our previous post TAUS Tokyo Summit: improvements in neural machine translation in Japanese are real).

Japanese neural translation engine for the electronics and IT field

Tokenizer.perl and Mecab were used for English and Japanese tokenization respectively.

We used a large training corpus of 4.6 million sentences (that is nearly 60 million running words in English and 76 million in Japanese). In vocabulary terms, that meant 491,600 English words and 283,800 character-words in Japanese. Yes, our brains are able to “compute” all that much and even more, if we add all types of conjugations, verb tenses, cases, etc. For testing purposes, we did what is supposed to do not to inflate percentage scores and took out 2,000 sentences before training started. This is a standard in all customization – a small sample is taken out so the engine that is generated translates what is likely to encounter. Any developer including the test corpus in the training set is likely to achieve very high scores (and will boast about it). But BLEU scores have always been about checking domain engines within MT systems, not across systems (among other things because the training sets have always been different so a corpus containing many repetitions or the same or similar sentences will obviously produce higher scores). We also made sure that no sentences were repeated and even similar sentences had been stripped out of the training corpus in order to achieve as much variety as possible. This may produce lower scores compared to other systems, but the results are cleaner and progress can be monitored very easily. This has been the way in academic competitions and has ensured good-quality engines over the years.

The standard automatic metric in SMT did not detect much difference between the output in NMT and the output in SMT.

BLEU does not detect the huge difference in perceived quality - WER is a better indicator

BLEU does not detect the huge difference in perceived quality – WER is a better indicator

However, WER was showing a new and distinct tendency.

NMT versus SMT results in Japanese

NMT shows better results in longer sentences in Japanese. SMT seems to be more certain in shorter sentences (training a 5 n-gram system)

And this new distinct tendency is what we picked up when the output was evaluated by human linguists. We used Japanese LSP Business Interactive Japan to rank the output from a conservative point of view, from A to D, A being human quality translation, B a very good output that only requires a very small percentage of post-editing, C an average output where some meaning can be extracted but serious post-editing is required and D a very low quality translation without no meaning. Interestingly, our trained statistical MT systems performed better than the neural systems in sentences shorter than 10 words. We can assume that statistical systems are more certain in these cases when they are only dealing with simple sentences with enough n-grams giving evidence of a good matching pattern.

We created an Excel sheet (below) for human evaluators with the original English to the left and the reference translation. The neural translation followed. Two columns were provided for the ranking and then the statistical output was provided.

A table showing original English and Japanese reference translation

Neural-SMT ENJP ranking comparison showing the original English and the reference translation, with the neural ranking to the left and the statistical system to the right

German, French, Spanish, Portuguese and Russian neural translation results

The shocking improvement came from the human evaluators themselves. The trend pointed to 90% of sentences being classed as perfect translations (naturally flowing) or B (containing all the meaning, with only minor post-editing required). The shift is remarkable in all language pairs, including Japanese, moving from an “OK experience” to a remarkable acceptance. In fact, only 6% of sentences were classed as a D (“incomprehensible / unintelligible”) in Russian, 1% in French and 2% in German. Portuguese was independently evaluated by translation company Jaba Translations.

Human evaluation of neural translation in German, French, Russian

Human evaluation of neural translation in German, French, Spanish, Portuguese, Italian, Russian

This trend is not particular to Pangeanic only. Several presenters at TAUS Tokyo pointed to ratings around 90% for Japanese using off-the-shelf neural systems compared to carefully crafted hybrid systems. Systran, for one, confirmed that they are focusing only in neural research/artificial intelligence and throwing away years of rule-based work, statistical and hybrid efforts.

 

Systran’s position is meritorious and very forward thinking. Current papers and some MT providers still resist the fact that despite all the work we have done over the years, Multimodal Pattern Recognition has got the better hand. It was only computing power and the use of GPUs for training that was holding it behind. The above article at PangeaMT provides some information about what is changing in the automated translation landscape as we speak and an example of the first neural papers back in the 90′s which has guided much of our own R&D.

Neural networks: Are we heading towards the embedment of artificial intelligence in the translation business?

BLEU may be not the best indication of what is happening to the new neural machine translation systems, but it is an indicator. We were aware of other experiments and results by other companies pointing in a similar direction. Still, although the initial results may have made us think that there was no use to it, BLEU is a useful indicator – and in any case, it was always an indicator of an engine’s behavior not a true measure of an overall system versus another.  (See the wikipedia article https://en.wikipedia.org/wiki/Evaluation_of_machine_translation).

Machine translation companies and developers face a dilemma as they have to do without the research, connectors, plugins and automatic measuring techniques and build new ones. Building connectors and plugins is not so difficult. Changing the core from Moses to a neural system is another matter. NMT is produces amazing translations, but it is still pretty much a black box. Our results show that some kind of hybrid system using the best features of a SMT system is highly desirable and academic research is moving in that direction already – as it happened with SMT itself some years ago.

I brought some useful tips from my attendance to SlatorCon in London. One is that translation buyers are still in sheer need of affordable translation solutions that can centralize assets and workflows. Another one is that neural MT is taking center stage as the technology that can truly change the game. The most important one, I would say is that venture capital money is pouring into the translation industry because it sees strong similarities with other industries (advertising, for one) that were disrupted years ago and produced something new.

“There was not a lot of technical innovation in the advertising industry until the late 1990s,” observed Marcus Polke, Investment Director from Acton Capital Partners. “And then came the Internet, which bypassed and marginalized ad agencies as online and offline advertising transformed into a complex landscape.

Yes, the translation industry is at the peak of the neural networks hype. But looking at the whole picture and how artificial intelligence (pattern recognition) is being applied in several other areas, in order to produce intelligent reports, tendencies and data, NMT is here to stay – and it will change the game for many, as more content needs to be produced cheaply with post-edition, at light speed when good machine translation is good enough. Amazon and Aliexpress are not investing millions in MT for nothing – they want to reach people in their language with a high degree of accuracy and at a speed human translators cannot.

TAUS Tokyo Summit: improvements in neural machine translation in Japanese are real

Not that business plans are written in stone any longer, but efforts to provide an insight by experts are always welcome. TAUS Tokyo Summit provided a much awaited for set of good news about perceived human translation improvements in neural machine translation in Japanese. English-Japanese was a well-known difficult language pair for rule-based machine translation and statistical machine translation provided a really awful experience for many Japanese audiences. It has historically been one of the hardest language combinations to automate. It seems that neural machine translation may be the answer.

Day 1 – Where is the translation industry heading?

Jaap began by summarizing the latest meeting of thought leaders in Amsterdam who met in Amsterdam in order to brainstorm a potential landscape and priorities for the language industry in the five years. If machine translation hype was at its peak five years ago with statistical machine translation and all sort of hybrids, we are now beginning to experience the neural MT hype. But adopters and developers are much wiser. If data was king some years ago, it seems we may not need so much in the future. Datafication was a process started some years ago after an article called “The Unreasonable Effectiveness of Data” (Elon Halevy, Peter Norving, Fernando Pereira, 2010, Google). The article said that the more data the better if our aim was to collect data to train machine translation engines and models. The more data we had to teach the algorithms decide what was best, the better a statistical system would translate. The problem has always been the unclarity about copyright issues with translation data. For example, law is different between US and Europe with regards to translation ownership.

TAUS has been focusing in the development of tools and practical services to the translation industry it serves, such as

  • Machine Learning
  • Quality Dashboard
  • Machine Translation
  • Intelligent TM
  • Interoperability, etc.

The set of services and tools (such as DQF) may soon become industry standards and they can be used to benchmark and measure productivity in-house and also with other (anonymized) players. DQF is now available as an API and can collect data real time as translators work, without disturb them. It is a transparent model and reports can be tracked to track reports, statistics and benchmark against other translators.

Jaap mentioned that Europeans are very worried that Google and Microsoft to “fix the problem” and be left out of the language technology race, referring to one of his previous articles “The Brains but not the Guts”. Europe is exporting talent to the US, an army of language scientists who are helping those two giants overcome the language barrier. On the other hand, machine translation has been accepted, it is becoming an API. On a daily basis, output from machines is 500 times bigger than the output from all professional translators put together. The translation industry is growing but also changing radically. What companies do nowadays is not pure translation any longer but telemanagement, post-editing, transcreation services, project management crowdsourcing, telemarketing, etc.

Translation is datafied. We want to know everything happening in a translator’s environment so we can accurately measure how many segments are translated, or words per hour. Eye movement tracking and word suggestions have been around academia for some time but they have now crossed the barrier to commercial MT services. We even track translators’ social graphs, how the weather or news affect the translator, third party applications, how much leveraging from previous translations was used. All that information can help us to automate project management more and improve resource allocation. We are moving to a future where project management will also be automated.

An interesting parallel was drawn between industries when Jaap mentioned that food delivery people do not have a boss, they have an app. All they are interested in is where to pick up the food and where to deliver it. And that’s a kind of post-editor. Translation buyers are finding that some vendors send out their jobs out to the internet and freelancer translators do general machine translation and post-edit it. “I only had to do some minor fixes”, said one PM from a leading translation company. The fear is “how long until my client finds out he can do the same?”, that is how long until translation buyers find out they can post jobs on the internet (via an app, maybe) and pay post-editing rates to cut out project management fees? In short, will everything handled by robots in the near future? Pay-as-you-go models may change and users will become more active with the management of terminology, labelling, etc.

The representative from Athena Parthenos created some controversy by stating that creativity will help the industry survive as creativity is the realm of humans. Mark Seligman agreed as he said what machine translation cannot do is convey the emotions of humans, which is what marketing is all about. Chris Wendt, from Microsoft disagreed: “I have seen very creative neural translations”. Another possibility, according to Jaap was that post-editing will not longer be needed, there will be people behind dashboards and people doing the creative jobs.

Day 2 – Neural machine translation has cracked the language barrier in Japanese

But the juicy news came on Day 2. Presentations from Systran, Pangeanic and Google provided news about development of neural networks applied to machine translation with a particular accent on improvements in neural machine translation in Japanese, with Human Science reporting on post-editing from Google’s NMT API . Consensus run on neural machine translation producing more natural and fluent output than phrase-based MT. However, there are problems, too. Neural machine translation can produce unreliable output when confronted with unusual input or when a strictly literal rendering is desired. On the plus side, neural machine translation seems to be highly adaptable and it has the potential of being applied to other natural language tasks.

SDL presented UpLift, a technique similar to their old concordance check which combines words and small subsegment units which reminds me a lot of an old technique by Dejà Vu and Transit in the past. The difference now is that it is automatic, it is applied to all words in a sentence and shows translation. The back technology is the creation of a glossary “behind” the TM. This is done by creating an index (at the end of the day, when the PC is not used, according to their own recommendation). This is combined with syntactic analysis for Asian languages. The new version “repairs” fuzzy matched automatically if the difference is only a word or two (a feature also offered by our own ActivaTM). I found it striking to learn that SDL finds that people do not bother to re-use and re-train their own engines once created. Its automated training system has not been so successful (perhaps because of data privacy issues, since SDL is, at the end of the day, another LSP).

Mark Seligman gave an overview of speech to speech translation, particularly from a Japanese (and in general Asian) perspective with the first speech-to-speech product by LinguaTec (currently Lingenio) to 2017. Most of these products were ahead of their times. NEC had an Japanese-English, but the real watershed came with the app of Google Translate which gave birth to a speech translation. Jibbigo was happening in Europe at the time, too. Sony had one phrase-based app and Phraselater used by the US military. Mark provided an impressive speech-to-speech live Japanese-English translation over his app, SpeechTrans, and stated that “Google-type glasses” with subtitles or similar technology would be available in 3 years not 300.

Systran’s presentation provided a lot of information about their Open NMT initiative and how they have created a community á la Moses. I would like to write more about the value of this worthy initiative and how it may become a very significant force in a post-Moses world, although SMT systems will have life for some time.  The better outputs provided by neural machine translation in Japanese have prompted a kind of fever and much higher acceptance levels as phrase-based systems behaved with a higher degree of predictability with close language pairs. Morphologically-rich languages such as the Slavic family also proved notoriously hard to automate.

Our presentation offered information on our first results on engines built with identical datasets in French, German, Italian, Spanish, Portuguese, Russian and Japanese but using an SMT system and a neural network, with astonishing results. Systems built with identical data but in a different way (statistical versus NMT) provided rankings of “human quality” “almost human quality” in 80%-90% of the 250 sentences tested, including Russian. The improvements in neural machine translation in Japanese are real.NMT provides a better translation than the original translation

A copy of our presentation and results is available in slideshare

As Mark had previously done with a speech-to-speech system, Microsoft’s Chris Wendt provided a live test of his speech translator starting with the Star Trek sample (an alien and a human speaking to each other with a different device). The audience had to keep quiet so noise did not have an impact on the translation. Speech translation had been inspired by science fiction, yes, but it was now a reality (the same happened to Jules Verne submarines, Around the World in 80 Days, etc…) Microsoft’s neural network can accent English from non-native speakers as input. It works with Indian, French or Spanish accents, but it is not so good with strong German or Russian accents. He introduced TRUETEXT for cases where there are hesitations by actually saying what you are trying to say without hesitation, stops, etc., so that the input is more prone for machine learning.
Microsoft speech to speech translation demo
https://blog.pangeanic.com/wp-content/uploads/sites/3/2017/05/Transcribed Japanese text in Microsoft live demo

 

 

 

 

 

 

 

There are many potential uses of multilingual speech-to-speech technology: multilingual meetings, schools in the US and situations where there is one speaker and many are listening. I wonder if this may create an audience of “lazy” language learners? People asked questions to Chris in Japanese, Italian, Chinese (verbally) and Chris replied in English, which was shown in each language on the monitor. He then switched to his native German (switching the language settings in the device) and translation was provided as written text on the monitors. He still received questions in Singaporean Chinese but now the system was translating from his German into Japanese and Chinese. The system slowed down a little bit, but the leap was also great with a lot of people asking questions. Chris stated that English-Spanish is the best working combination as they are syntactically similar languages and there is also a lot of training material.

The last presentation was from Google’s Macduff Hughes, who began by addressing an audience who had already been convinced on the superiority of neural networks for Japanese English translation. “Last year NMT was a rumor, 6 months it was the beginning, and now it is here”. Hughes took Spanish as an example of one of the best language pairs and analyzed how much better and fluent neural machine translation was in comparison to phrase-based. Gender was wrong because of length in SMT in several instances, but as neural absorbs the whole sentence, it neural fixes a lot of the small annoying errors in Spanish, though not all the time.

GNMT is not ready to handle tags yet (in fact no neural system can yet). Moderate amounts of in-domain data can adapt a model. The challenge is that it can be hard to evaluate, and also automatic training, stopping and scoring. So iin this respect, there is a lot of good work that has already been done in statistical systems that cannot be imported into neural networks so easily – a conundrum faced by all MT developers.

Interestingly, Hughes pointed to experiments that prove that source sentences meaning more or less the same thing can produce similar results, which points to the fact that a kind of interlingua has been developed. Knowledge can be transferred to chat or other Neural Networks understandings.

But interlingua is another story…

web and spider crawling down

A web of problems: Why Google Translate and website translation can’t marry

It is not news that machine translated websites are penalized by search engines. Google has developed its technologies on the back of reliable bilingual website crawling and freely available public data. After ditching rule-based engines (Systran) back in 2006, it embarked on a mission to use statistical machine translation (SMT) as a byproduct of its own data analysis. Websites that use machine translation to inform users are crawled and aligned, but those alignments provide data that adds dirt (read: uncertainty) which worsens the probabilities and hence the output (read: the translation). That is why Google Translate and website translation can’t marry.

A machine translated website will be penalized by Google, for it is dirty. It is also a proof of laziness on the part of those responsible. The search giant wants to analyze natural, human data. We recently bumped into an article on Slator.com that got our feathers all aflutter. In short, it proved the above point, which has been a known issue to translation companies and those offering proxy translation, often with the economical machine translation option.

web and spider crawling down

Nowadays, even e-commerce sites (see Magento help section on multilingual) do not recommend using machine translation for professional results and better ranking. It may sound ironical, but Search engines (read: Google) will penalize websites using Google Translate for their multilingual website.   Pangeanic has been a diehard advocate for quality website translation, developing Cor as a crawling and translation assistance technology that does not interfere with any of the code nor it provides machine translated output to Google’s algorithms. It checks your content, extracts the text and sends it out for translation. Whenever we hear a website or company will use raw proxy translation or simply Google Translate, we feel so sad. It is business lost, it is the cost of time wasted, wasted investment, having to face the wrong option was chosen some time ago, lose credibility… lose business and customers when the intention was to win.

Google’s violation guidelines

Google clearly bans automatically generated content (in order to avoid black hat SEO and similar techniques), including “Text translated by an automated tool without human review or curation before publishing”. Look for it in its violation guidelines. Thus, raw machine translation, unnatural results (and it is not difficult to detect a text has been produced by software) will bury your website deep in a web of penalization.   This kind of careless publication is viewed as spam or, worse still, copied or duplicated content.   You will find it hard to make up for it, unless you are prepared to probably do well what should have been done well in the first place. Follow this link to learn more about the dangers of duplicate content. 

But Pangeanic develops machine translation technologies, doesn’t it?

Yes we do. We are a well-known developer of machine translation technologies and language technologies. We use them in order to automate processes and it is particularly useful in controlled language situations, like instruction manuals and documentation for the automotive industry. It is extremely useful for gisting, to get a quick idea of what a text in a foreign language says at light speed. It also helps translators in certain situations to pre-translate and post-edit the content, which always needs a final verification in order to ensure to flows as natural language.   If your website is rather big (a large e-commerce site, for example, can contain tens of millions of words) and you decide to translate sections of your website using raw MT, there is quality option to consider. We can offer machine translation engines that are trained with your previous translations (aligned as reliable “translation memories”) which will speak your language in your style and will contain your terminology, specific to your products, services and industry. Creating engines with your own data, or customizing our own engines with your data and terminology will create better quality translations than general, online machine translation tools. Our expert translators can post-edit the content to make sure it conveys the message as it should.

Your website MUST PROVIDE VALUE

This is surely one of the most difficult things to do, but it is extremely important to search engines. Your content must be informative and engaging. Bouce rates are an indication of how visitors interact with your site, but a high bounce rate may not necessarily be an indication of a bad website. Some of your pages may offer the information the visitor was looking for. The visitor leaves without interacting because he /she found the information. Check this informative post by Yoast on why a high bounce rate is not necessarily a bad thing for your website. Maybe the person spent a minute, two, three or more reading it. A machine translated website simply does not offer the quality content nor the value website visitors want. 

Multilingual SEO strategy

Keywords cannot be machine translated, people search for different things in different places.

A simple keyword like “sneakers” can serve as an example (follow this article for a list of top ten disagreements between US and British English). It is widely used in the US, although more profusely in some areas than others. British English uses “trainers” (from “training shoes”. People looking for this kind of garment will not land on your page if you are using a different keyword – and so it happens with languages. Machine translated keywords just won’t work in other languages.   Pangeanic solves this challenge by specialist translators with a flare of marketing and aware of such issues. They use our website analysis and SEO tools (SEMRush, Google AdWords, etc.) in order to check the popular options in each country/region so you can make an informed decision about how to market your products from your website, and not use a general or direct translation.

5 tips to translate a website in many languages and embed it in your business strategy

by Manuel Herranz

Large enterprises and even SME’s around the world are realizing how important it is to translate a webpage in many languages.

1. A free website translator isn’t simply enough.

It may do the job fairly well if you just need to understand a website in another language, but that kind of automatic translation is not good enough when you are looking to attract customers.

2. Free website translations published as good content send the wrong message to your potential audience.

Google can be quoted as the best example. The search giant is very aware that it is the search engine of choice used around the world and it needs to be available to everyone. Since there are still billions of people who can’t read English or understand it, Google provides the option of translating websites and search results into the language they are familiar with – but this is a quick, on-the-fly HTML conversion for information purposes only.
If you want to establish a solid business presence in many countries around the world, then you need professional website translations as well.

3. Thanks to a multilingual website containing website translations of your original product descriptions in other languages, your target audience is much wider.

You have been targeting a particular audience since the inception of your business. Translation into several languages of your web content has many benefits and one could literally write a book on them. The number one, of course is that if your website has always been monolingual, then you were only communicating with people who understood your main language. With website translation you can rank in search engines and carry your message to people who don’t understand your web’s first language. It will actually make sense to them when they visit your website, click on their language button or tab and are able to read everything that is written on it.website translations increase SEO visibility

Brand image is the most important thing for any business in the world and brand image is not seen by looking at the size of a business or the quality of its products. Brand image is measured by looking at the people’s attitude towards a business.

A business can change people’s attitude towards it through its marketing efforts. Your brand image starts to build up as you start making some place in the hearts of your customers. That’s done through intelligent marketing, connecting advertisements and by personalizing your messages for them. When you translate a website into Spanish, you are opening up to an audience speaking the 2nd most spoken language in the world (500 million). Spanish has a strong presence not only in Europe and Latin America, but also in the United States – and brands are learning the power of marketing in US Spanish.

Pangeanic has a long relationship with Japanese companies. If you are Japanese and you decide to translate a website from Japanese to English will understand that it’s an amazing way of starting a connection with people from different corners of the world.

4. Better SEO and marketing results.

Start introducing new content on your website in multiple languages and you will see an increase in traffic and conversions – that is almost guaranteed. There are several strategies to do so, either with a multisite strategy or with a multilingual site.

Related Content – Learn more about multisite and multilingual sites for SEO:
3 Tips on translating a website and website localization

The more languages you add to a website, the more keywords search engines will detect on your site. OK, there is work to do in Analytics, regular publishing, geo-localization, website hosting speed, etc. But do not even think twice: the more languages a website contains, the higher the changes to be in the top spot in Google.

But rankings are just one objective. The point about Inbound Marketing is that your website will act as a point of reference, as a result of the knowledge it provides to its visitors. When you convert your customers into loyal customers, you can rest assured that your customers are not going anywhere else for many years to come. They will also review your services and provide testimonials. Customer loyalty is the added collateral benefit when you translate your website into different languages. You will benefit from a greater online reputation. Remember younger generations were born with the Internet. Reviews and comments, plus the corporate information you may add in their language are more relevant than marketing material in many cases.

With website translations (and not an automatic “translate website button” or a “webpage translator”) make your brand a part of people’s lives by connecting with them culturally.

5. Establish a long-term relationship with translation company.

Your website is most probably already published and content is up. And quite likely, the only place containing all the text that needs translating is… your website. It is typical for a website to develop over time. Pangeanic technologies can crawl your website and extract and text in a bilingual format for you to publish immediately, keeping a bilingual copy of all your linguistic assets.

If you publish content regularly, Our Cor technology will make it even easier for you to keep track of your publications automatically. You publish and Cor detects your new content, extracts it and sends it to a project manager or translator so it can be processed at the regular interval you require. Watch the video below to see Pangeanic’s crawler in action, keeping track of our own publications.

Lastly, if your content is high confidential and you need to translate a website but confidentiality is paramount before public release, our Client Portal makes it easy for you to upload content in a completely secure manner thanks to our encrypted solutions.

 

If you publish content regularly, Our Cor technology will make it even easier for you to keep track of your publications automatically. You publish and Cor detects your new content, extracts it and sends it to a project manager or translator so it can be processed at the regular interval you require

Machine translation: Can it be used to translate travel industry content?

by Manuel Herranz
There have been strong opinions for and against machine Translation over the last few years. Whilst the general public has become a keen user of free online services, professional translators have poured bitter criticisms against the technology. Understandably, because the language industry is a small industry compared with other sectors where automation took place years ago (automotive industry, printing, telecommunications, to name a few). The Internet and in general any industry based on electronic communications has added to the increase in demand for multilingual websites, which means more translation for eCommerce sites and website translations. There are many supporters of machine translation technology because of the many advantages and problems it has solved where a translator could not be at hand and human translation was not an option. See the video celebrating Google Translate’s 10 years. But it has also gained something of a bad press, particularly because the various free online translators (and I stress the free) on the web. If you read our articles in this blog often enough, you know by now that Pangeanic is a developer of a machine translation platform. We build engines for particular applications and clients. Our research team and collaboration over the last 8 years with Valencia’s Polytechnic, the Computer Science Institute in Valencia, the EU’s Expert Project and Spain’s Center for Technological Innovation (CDTI) has borne fruit in our Pangea version 3: an improved, state-of-the-art platform that not only automates the engine training and retraining process, but it also incorporates search engine capabilities in a hybrid translation memory + machine translation approach. Even so, we advise companies to be cautious when applying machine translation solutions as if it was all as easy as copying and pasting into a Google Translate panel. Free comes at a cost, and sometimes a very expensive cost.

Read more: Google is 10 years old

Some free solutions, like Google Translate can produce reasonable results in certain language combinations and mostly with English as the source or target language, some content types and translation between some languages simply will not lend itself to machine translation under any circumstance. Japanese and English is a language combination known to produce unintelligible results. We learnt a lot from our 2-year collaboration with Toshiba. Nevertheless, it is important to make a distinction between these free “translate any language” online tools and a custom-built enterprise machine translation solution.

The risk of no post-editing in travel sector translations

“Safety is important for us here at Novabikes” becomes “Safety above everything is important for Novabikes United States” – Google believes “us” is meant as “U.S.”

In this case, industry specific content and very often the client’s own terminology is used to build a tool for a specific purpose. Engines learn how the client (user) wants to translate and MT engines are trained with large quantities of language data. In other cases, hybrid approaches ensure further customization with specific client or language rules and translation memories.

Benefits of enterprise-level machine translation solutions

Nowadays, businesses can use machine translation for a wide range of purposes, including:

  • translating online help,
  • translating knowledge bases,
  • data collection,
  • customer services (multilingual chat systems),
  • emails in a different language to understand what a potential client is asking for (lead generation) – although articulating a marketing message with machine translation is another matter,
  • internal communications among multilingual staff (read more about how IBM used feedback from its own staff to improve its own MT solution),
  • and, in general, low value content for which there was no translation budget in the past.

Under the right conditions and processes, with a limited level of human input, MT can now deliver high quality translation almost in real time with a more than acceptable quality, akin to that of human translators. Companies trading internationally can considerably improve their translation productivity and language reach and, through it, their operations. A typical machine translation software like PangeaMT can turn around several tens of thousands of words per hour for immediate use or for human post-editors for light or heavy post-editing. Machine translation is now an option for content which previously was considered to have no ROI, for which there was never any budget or was overlooked because of time constraints.

Pangeanic’s Use Case: Translation services for the hospitality and tourism sector Ona Hotel Group

Translate travel industry content – The Machine Translation challenge

But are machine translation technologies good for the travel sector? The travel industry can throw up one hurdle after another from a translator’s point of view. However, scalability and response times are by far the number one challenge. Let’s take Booking.com as an example. They have well over half a million hotels in their inventory. Even with a short 50-word review on each (and most hotels have several reviews), we would be looking at over 27 million words of content in several languages. With several reviews per hotel and surely over time, this very conservative figure can double every year. Does it make sense for booking.com to use human translation services for all this content? Surely translation into any language of a exquisite New Year’s hotel menu is not what we have in mind when we think of machine translation in the travel industry. We know only too well at Pangeanic. Never publish any brand-level document that has to face the public and convey the idea of professionalism and service as raw MT output. Nevertheless, there are countless other documents which are prone for machine translation. They are not so client-facing or, again, are low-value because of they are ephemeral content. Common Sense Advisory is a Boston-based research center for the translation industry.  If you follow our blog, you are probably familiar with their “Can’t read Won’t buy” report which says that 88% of people are more likely to make a purchase if the information they see on the web is in their own language. In Europe, the EU published a report with similar conclusions: Europeans choose websites in their mother tongue and most do not feel comfortable making a purchasing decision in other language but their own. For many, understanding well what they were about to buy was more important than price. Positive or negative comments about a hotel or establishment have a deeper in a person’s native tongue. They boost engagement, even if the translation is not perfect. It is one of the mantras of selling: buyers are put off when they feel the seller has more information than them during the sales process. When the seller provides information and knowledge in order to bring the buyer to an equal footing, buyers are more likely to buy because they handle the same information or even more than the company they are buying from. Transparency is key, and machine translation is key in providing such information many times: it is immediate and it is neutral, as it is the product of an algorithm in a translation software. Thus, machine translation services enable the travel industry to reach new potential clients anywhere at a fraction of a cost of setting up a representative office. Also, the web centralizes and scales up their business. User generated content can serve the purpose of regular updates, which linked to a clever keyword strategy can bring several benefits, traffic and conversions. It is hardly news that around 80% of tourists and business travelers weigh reviews by fellow travelers before making a booking decision. And as machine translation can only get better with new techniques (neural networks or deep learning), MT is here to stay and to become embedded in all travel companies’ websites.

But… Shouldn’t all my multilingual web content have the highest translation quality?

No, not really. The traditional, client-facing publications that convey the image of your company as a brand, for sure.  This includes marketing brochures, hotel descriptions, reports, menus, magazines and newsletters, promotional material, in-flight entertainment, , user generated reviews, and of course social media posts. But user-generated content is mostly important as feedback and as collateral information to other users. Some may be very relevant, the majority of users’ comment, may not. And this is the reason why machine translation solutions go hand in hand with human post-editing solutions. It depends on the final desired quality expectation. Light post-editing will make the content fairly acceptable to humans but never the level of human, professional translation services. Full or heavy post-editing expects a deep revision in order to make the document indistinguishable from what would have been a human translation. This requires, often, the work of a first post-editor and a final proofreader. Heavy post-editing is ideal for content like resort and hotel descriptions, etc.

The Pangeanic difference

Here at Pangeanic, we appreciate that it’s vital for travel industry insiders to be able to translate languages fast and in an uncomplicated way. That is why our international hubs and online translation management platform cover the translation needs internationally across a variety of media channels and to cater to the needs of travelers at every point in the buyer journey. We offer cutting-edge translation technology and in-country linguists who can cover more than 150 languages, from Portuguese to Russian in Europe, from Japanese to Gujarati, Pashto and Indonesian in Asian. Our Chinese office can cover both Simplified Chinese, Traditional Chinese, Taiwanese. As one of the best US translation companies, with a UK translation company, and offices in Spain, Japan and China, Pangeanic specializes in localizing marketing material, hospitality and  hotel websites, ecommerce, travel apps, video, travel reviews and much more, so give us a call or email us today to find out what we could do for you. – See more at “Translations for the tourism industry

Next time you think languages, think Pangeanic
Translation Services, Translation Technologies, Machine Translation


Medical Translations: Quality Matters

by Manuel Herranz

When you think about two different jobs, doctors and translators do not come to mind as two related professions

But the fields of life science and medicine and translation services

do share at least one important feature: you never call the doctor until you need one. Likewise, you never search for translation services until you really need a translator. But the same could be said about the legal profession and legal translations or the engineering industry and technical translations. The translation industry is a multi-facetted industry and professional translators are supposed to be experts or knowledgeable about many fields. It is not a small industry either, and myths about the translation industry are disappearing as technology has been able to automate many processes. Ideally, experienced translators working in any of those particular areas of knowledge will help improve the conditions of each industry, and in particular the medical industry.

All translation agencies / translation companies offer different levels of service

These depend largely on the target audience, meaning and the final intended use and final readers. We offer 4 general translation levels at Pangeanic, plus others which may involve machine translation and lighter or heavier post-editing (revision by a human translator). Generally speaking, the more critical the application, the more eyes are going to read the document. The more serious the consequences in case of an error, the higher the translation level and the more verifications, quality control and translation reviews and stages a document will have. We could classify translation levels as follows

  1. Fast Translation (one linguist) This is purely a translation service for internal materials or when one translator and his/her own proofreading suffice. No serious publication is expected. The buyer of translation services can put up with some errors as speed of the delivery is more important than the beauty of the expression.
  2. Standard Translation Translators with experience in a given field produce a first, quality translation. They are quite familiar with the subject area and are able to verify their own translated document. This is later checked by an expert Project Manager and returned to the translator for final approval.
  3. Premium or High Quality Translation Services This translation service requires an expert translator to produce a highly technical, medical, legal translation. His/her version is then carefully checked and proofread by a different translator for terminology, style, expressions and accuracy. The proofread version is returned to a Project Manager who also has experience in the field and to the first translator for final approval. This process is rather lengthy, but it is required not only for the technical areas above, but also for marketing translations where a transcreation of the original is needed.
  4. Proofreading Only If the person writing has a high command of the target language, he or she may require a native speaker to read his or her first version

Clearly, all medical translations and life science translations would fall under the category of “Premium” or “High Quality Translation Services”. But why? Medical Translations - Quality Matters If the different levels in expected translation “quality” are perfectly understandable, the medical and life science industry deals with something that concerns us all: our health, and sometimes life and death treatments. Clinical Trials are undertaken many times under strict regulations and they have to be carried out in several countries, by different institutions or  laboratories in order to be approved at a national level. The EU, for example, has several directives for clinical trials in the case of  Medicinal products for human use. Clinical trials have to be authorized, they have to comply with certain levels of transparency and they have to be reported. If clinical trials are conducted outside the European Union, but submitted in an application for marketing authorization for countries in the Union, they still must follow the principles which are equivalent to the provisions of the Clinical Trials Directive. Expect this to be the same in most countries – and in different languages. This is one of the reasons why translation for the medical translation industry has to satisfy higher expectations. There is not only the potential failure in the trial itself, but any small translation error or misunderstanding, any medical terminology error can have devastating consequences in budgets and time-to-market, not to mention misuse of a drug, pharmaceutical or medical device. Quality matters in life science and medical translations if only because a large number of people, from doctors to nurses to consumers, in fact, the entire supply chain is placing their faith in the accuracy of the translation. It may sound a commonplace, but quality matters in medical translations because it is life-critical.

Translator selection process for medical translations

The requirements placed on the recruitment of medical translation experts are quite high compared to other disciplines and usual levels of stringency. Proven experience is a must-have for any life science and medical translators, and “being a doctor” is not usually enough. Few doctors have spare time outside their busy schedules to spend time translating. This used to be a sales point by some translation companies years ago, but the Internet has changed accessibility and working habits. Doctors and medical personnel seldom have enough time to catch up with the latest developments in the medical field. Considering the sheer diversity of clinical specializations, it is simply not realistic to expect every medical translation to be carried out by a linguistically skilled doctor. Expert medical translators are, precisely, expert translators. They have either a previous level of experience in translation and have specialized in the medical field or they gained experience during their Translation Studies degree at University and have proved themselves in the market for several years as in-house staff or freelance translators.

PANGEANIC’S USE CASE: Medical Translation for Clinical Trials

Quality Processes in medical translation services

The “experience” is not just time spent resolving word equivalents. It includes the whole translation process and familiarization with several translation tools, terminology databanks and QA tools that ensure that terminology has been adhered to and respected. When medical translation projects are big enough, the translation company must put a team to work. This means that the translation company must have a structure in place be have a good Internet connection, plus confidentiality agreements in place and the means to ensure that information is treated confidentially. Translators working remotely cannot keep a copy of the work. Only translation agencies with enough resources and technology can assemble an efficient team for translation, proofreading and terminology management and checking quickly.

Did you enjoy reading this blog? Read about our experience with medical devices and medical trials at: Life Science and Medical Translations

American-English-translations-British-English-translations.

American English Translations or British English translations – it matters

by Garth Hedenskog
Any native British English speaker like myself will be able to recall a moment while using Microsoft Word® where their carefully chosen words get automatically corrected to American English – the default setting: “The shopper stood there momentarily to analyse the colour of the garment “ corrects to “The shopper stood there momentarily to analyze the color of the garment.”

Both American and British English are dialects of the English language; the second most spoken language worldwide. This renders the language as an essential tool for somebody to express their opinions in their respective fields, particularly in the ever-increasing medical and pharmaceutical industries where English remains as the only literary language – a fact that is largely accepted within the scientific community.

English was introduced to Native Americans during the 17th century during the colonisation of North America – and the worldwide strength of the language is parallel to the unrivalled maritime strength of the British Empire at the time. From the first confirmed British colonisation of Jamestown, Virginia in 1607, the English spoken in the UK and the English spoken in North America began to diverge – and this has lead to the formation of American and British English, two key dialects in the language today.American-English-translations-British-English-translations.

In recent history there has been more of a friendly rivalry between the North American and British peoples, and has moved away from animosity of the 17th and 18th centuries partly due to the formation of the ‘special relationship’ in the 1980s. This ‘rivalry’ often manifests itself in the way a person from the UK mocks a person from the USA (or vice versa) due to the differences in dialects – particularly in certain sayings and idioms, as well as certain meanings for words which may differ ‘across the pond’. An example of which may be the meaning of “rubber”, which in British English may refer to a tool to erase pencil, however in American English it is a common slang term for “condom”.

It would seem therefore that it is imperative for a translation company to be able to offer the choice of American English translations or British English translations by native speakers so as to correctly and professionally translate a document for the intended target audience, whether it is for the North American or UK audience.

It is just as important, if not more important to take particular care when translating a document into a specific dialect, e.g. Spanish into American English, or even translating between dialects i.e. British English into American English; an aspect of translation that is sometimes neglected in terms of its importance – particularly true for unscrupulous companies who rely heavily on machine translation with minimal post editing. Implementing the incorrect dialect could have disastrous effects for a company attempting to break into either market, and would see potential profits fall drastically. A company with publishing needs must therefore have complete trust in a translation company such as Pangeanic with specialist native American English translations or British English translations at their disposal to ensure accuracy and professionalism in document translation.

Can You Trust Free Web Language Converters To Do The Job

A big question we get asked by consumers is why do we need a language translation service. If you go online, you’ll often find the search engine offers to translate the page for you. Google in particular offer what might seem like a brilliant translation service. The best news? It’s completely free, and you don’t even have to set it up. When a consumer or user views your site they’ll get the chance to translate it themselves. But that might not be the best idea if you want your site to be successful online and here’s why.

 

Lost In Translation

First, you might think that a free search engine translator is doing a fantastic job, converting your site into any language. Well, for a free, automatic service it is. But you should understand that that piece of software will be converting your site at a basic level. While it might be possible to understand what your site is about from the translation, important details might be lost. If you think about it, there is a basic structure to most languages. But underneath that structure, there’s a subtext. For instance, in Italian, there are male and female genders for nouns. These simply don’t exist in the English language. A free language translator isn’t going to pick up details like this through the conversion. But there could be bigger issues as well.

 

Blind Spots

Often when you use a translator like this on your site, you will find a blind spot. This is an area of your site that the translator hasn’t been able to convert. This could be due to the wording or a fault in the subtext. It might be because you’ve written in an accent that the software could not translate. For instance, in China, there are a number of different written accents. Not all of these will be translated for free by a search engine.

 

That’s Not What I Meant

If you use a free translation tool on a search engine like Google, you’d be amazed how many times it’s reversed the meaning of the words completely. The best way to test this is to get a bilingual native speaker of the language to check it. They will often find your website isn’t saying what it should. That’s a problem, particularly if you’re looking to attract new foreign customers. This brings us to another point.

 

Unprofessional

Your site might look fantastic in your native language. It’s a clear site that tells the customer what they need to know. But after a free language conversion, it’s a different story. Due to blindspots, incorrect translations or details lost your site now looks a mess. It might not be somewhere an international customer would want to buy a product from.

 

Can You Rely On It?

The last issue is the problem of reliability. Despite all these flaws you might still think you can count on a free translation service for your site. You might be able to…if they always worked. But the fact is some search engines have them; others don’t. It depends if the specific user has the plug in plugged in. Then there’s the issue of phones. On many phones, these free translators don’t work at all. It’s for this reason that a professional translation service could be the best option for your website.

Break Through The Language Barriers In Your Business

Business owners overlook the importance of making sure that their company is ready for international trade. They don’t put the effort in to ensure that their business is appealing to people from all around the world. This is a big mistake and comes with several disadvantages that you will certainly want to avoid.

Unexplored Demand

By not making an effort to make your business open to an international community you will have unexplored demand. What this means is that your company might be doing great in the local market. But, there could be potential to repeat that success internationally. However, if you don’t cater to an international audience, you’ll never find out. A big part of that is making sure your company materials are being translated into different languages. It doesn’t take a lot to make a local business international. But without a translation service for your company, you could be facing a massive block in the road.

 

 Missing Customers In Your Home Country

Of course, it’s not just an international market that you might be missing out on. If you’re not using translation services, you could be missing out on demand at home as well. These days the world is a melting pot. There are more than 37 million native speakers of Spanish in America. If that number shocks you, we’re glad. It shows that if your business in America isn’t prepared for Spanish speakers, that is a massive potential demand missed. Don’t forget that demand could translate into huge additional profits for your business.

Losing Investor Interest

We are seeing more and more how important international investors have become for businesses. For instance, businesses in America are commonly funded by investors in countries like China and India. These investors could provide a lifeline for your business. But that will only happen if your company is accessible to them. If there’s a barrier in language, they may look elsewhere for business. One that is already prepared for international transactions.

Added Value

Making your business prepared for international trade makes it more valuable. One day, you might consider selling your company on. If you want it to fetch a good price, making sure it’s a player on the international market is a good step to take. Business buyers know that it’s no longer possible to make profits based on local demand.

Cheap To Run

If you use a translation service for your company, you can get rid of the language roadblock. It’s a lot cheaper than hiring full-time translators for your business. This is a service that you should think about outsourcing. A translator is often seen as a specialized business resource. This means that they can charge more than they perhaps should. Particularly, now that translation services are in such high demand on the foreign market. By outsourcing, you’ll be keeping your costs low while staying prepared for future foreign trade.

Thus, it’s in your best interest as a business owner to break those language barriers down. If you do, your company will be on a stronger position on the world market.

 

 

Translation Has Become A Vital Part Of Manufacturing

The importance of translation has been growing in the manufacturing industry. This has occurred due to the reasons that we’re going to discuss below. Technical translations, in particular, present a challenge. It’s crucial that instructions to different parts of the business or outside companies are followed to a tee.

 

Outsourcing

Lately, all businesses have become more reliant on the potential of outsourcing. The manufacturing industry is no exception. In fact, it’s usually the industry that’s leading the charge. Outsourcing is the process of using out of house resources to complete a job within a company. This is usually done to save money, time, and resources. Theoretically, an outsourcing company could be just down the road from the business. But usually, these transactions occur on an international level. The reason for this is that outsourcing businesses often stay profitable due to cheaper labor laws or tax incentives. These usually occur in other regions from the buying company. For instance, China is famous for its lenient labor laws. Many major businesses like Apple use outsourcing companies there.

If you are outsourcing to international companies, the language must be clear. You can not afford to lose anything in translation. Technical translations ensure the product or process is completed to the best possible quality.

 

International Trade

But it’s not just outsourcing that has shifted the focus on translation in manufacturing. Many manufacturers aren’t catering to local demand. They are reaching out to an international population.These days it’s cheaper to trade internationally. Businesses can run on a global level while keeping costs low. But again, translation services must be used. Otherwise, it won’t be possible to make business deals and form contracts with foreign buyers.

With the right translation service, business deals can be set up easily, usually online. A face to face meeting won’t always be necessary if all the information the buyer needs is readily available.

 

Getting Leads

To get these international business deals, companies have to be able to use leads. When a business is looking for leads in a country with a different native language, translation again plays an important part. The manufacturing website needs to be set, so businesses know what they’re buying. Again, technical translations are important because they can be tricky. If one small detail is missed, it can change the concept of what’s being offered and sold completely. Business leads are only going to be interested in buying if they know exactly what they are looking at.

 

International Workforce

Lastly, there’s the workforce. In the USA alone a massive section of the population does not speak English as a native language. However, these employees may still be the best workers for the job with the most skill. A manufacturing company will be able to hire them if they have the business materials they need. But the correct translations of these materials also need to be readily available.

This is why manufacturers have become one of the biggest users of translation services.