Author Archives: Manuel Herranz

About Manuel Herranz

Manuel works at Pangeanic, the translation company that provides translation solutions. Pangeanic was the first translation company to successfully implement machine translation system Moses as it was released from academia in a commercial environment (AMTA 2010, Euromatrixplus.net).Pangeanic is a translation company and developer of language technologies, with a focus on language automation, machine translation, translation APIs, and web technologies for website translations, technical translations and multilingual desktop publishing.

scrum board agile methodology software development illustration project

Agile Localization and Continuous Localization – The evolution

Our people here at Pangeanic have been around the block when it comes to translating and localizing products for international markets, from website translations to software. These are the guys you would call “an expert in the field.”  However, in order to keep their guru status, they have to keep up with the new ways content is produced and, obviously, the new ways it has to be delivered. Translations are not “given” to users any longer. They are expected in various shapes and many formats and, many a time, they are expected immediately. Sometimes, they need to be produced for intermediate versions and releases. We have heard the terms Agile Localization and Continuous Localization, but do we know what they truly mean? Let us talk to Garth Hedenskog, Director of Sales at Pangeanic and Manuel Herranz, Pangeanic’s CEO. Garth faces many clients every day and he is aware of the increasing needs of many clients to obtain faster, better translations.

Garth Hedenskog Director of Sales at Pangeanic

         Garth Hedenskog, Director of Sales at Pangeanic

A little background: Why would we call you an expert?

Garth: I have amassed 10 years of experience in the translation industry, beginning when I move to Spain in 2009 and began working at Seprotec, the largest Spanish LSP. From there I moved on to Mondragon Lingua, part of the large co-operative corporation.  I was the leading sales person for international markets at both companies, so I can say I have known first-hand the needs of many clients that need traditional translation services (what we may call “cascade translation” or “cascade localization”). The needs of clients are changing because the markets are changing. We live in a world where things are happening every second. And they are shared, published online. We have more and more apps, a lot more software being released. Market opportunities are anywhere. The work of the translators and translation as a profession is evolving.   The market has changed considerably over the last 5 years, with the advent of machine translation, web crawling, and massive databases that can retrieve data in no time, to name just two or three technologies. This has enabled translation companies to offer much faster services that do not require human quality translation, or they do while their product is still under development and are looking at a “single source of truth” while different departments develop and write the documentation. Even the interface may change.   The terms I’m about to deal with are used mainly when developing software for international markets and they define different translation project management models or methodologies.

Waterfall Localization

The waterfall model refers to the traditional linear approach based on complete versions of a product being released once the master version is ready. This is the translation method we are all familiar with from the 80’s, and 90’s. Once all functionality of a product was tested and ready, release would follow. This is still essentially the method in many regulated industries, for example when we are dealing with medical translations or the translation of technical manuals. In software, it meant that a project would begin and follow the linear steps for translation. It would be “released” once all steps and all pieces were complete.

Agile Localization

Agile Localization follows the idea of software iterations rather than the traditional “final copy, ready” publishing. Translation is embedded in the process. Therefore, the aim is to get the software ready during the development cycle. There is a very informative website about what the Agile movement is and how scrum fits. For example, in traditional companies, there is a technical department and a sales or business development department. An agile corporation would not have a clear-cut division between a “business side” and a “technical side” working like separate bodies. Teams would work directly on delivering business value, because we get the best results when we involve the whole business. Agile localization is usually done in group stages containing intervals referred to as “scrums” (we could break down each interval / iteration of the scrum into a “sprint”). Scrum Agile methodology software development

Continuous Localization

We could say continuous delivery is a subset of the Agile approach, but in this case the product is ready for release at any time during the development cycle. This has been happening a lot lately, as it is a particular feature of mobile apps and mobile videogames. We are all familiar with apps and games being updated almost daily on our smartphones. These tend to be small updates or enhancements rather than full version releases. With a continuous localization approach, you don’t stop development and then make a release. That will only happen with major releases. For us, in the translation industry, this means a constant flow of data (sometimes small data) that has to be taken care of in a very automated way. Otherwise, it makes no sense financially, as minimum charges eat up all the translation budget. Our ActivaTM database is great for this because it can provide a “single source of truth”: even if one translation job is not finishes, updates can arrive and the same body of content change to an ongoing translation. scrum board agile methodology software development illustration project

Deep learning – The day language technologies became a Christmas present

It is said the third Monday every January is the saddest day in the year. It does not take deep learning to feel so. A long vacation period has ended. No sight of another one until several months away. Overspent, overstuffed, with no more presents to exchange, with winter settling in the Northern hemisphere and missing the drinks and chocolates that made our sugar levels go sky high, many start booking holidays in the sun. Let’s turn the clocks back to Christmas and we will remember the last few weeks as the Christmas when language technologies made it to the top of the list. Millions of people, literally, have opened boxes whose content was an electronic assistant with a rapidly improving ability to use human language. There are two main products: Amazon’s Echo, featuring the Alexa digital assistant, which sold more than 5m units. In essence, Echo is a desktop computer but in the shape of a cylinder. There is no keyboard, no mouse, no monitor, no interface – just voice.

A picture of what Amazon's Echo looks like, a black cylinder with speakers

Amazon – Echo’s Alexa

Google Home can do quite a few things Echo can’t. For example contextual conversations or sending images and videos to your TV set and it seems pretty good at playing music. Back in March, Amazon released its own API for Alexa Voice Service (AVS). That is the service which powers the Amazon Echo, Echo Dot and Amazon Tap. The move means that you can use Amazon’s Alexa in third-party hardware. Amazon’s final goal is to sell hardware and it has become very good at doing so. Google has a long history of glasses, cars and collateral products that have been discontinued. Although the best artificial intelligence engineers favor Google over most companies, the battle out there is becoming a war for talent. Amazon acquired ailing machine translation firm Safaba late 2015 not to add the company’s technology but mainly to add chief scientist Alon Lavie and create Amazon’s own MT and R&D department.

Back to Alexa. Try asking it for the weather (typical easy question), play music, to order a taxi, to tell you about your commute from home to work, and even to pick a joke for you, and Alexa will comply.

These two have made the first technology, Apple’s Siri a second choice (my experience with Sierra’s voice control on a laptop is far from satisfactory) and Microsoft’s Cortana a minority choice (yet).

List of things Siri says it can do

Siri says it can do a lot of things but….

Siri cannot understand "Find Echo Alexa photos on Internet"

it cannot understand “Find Echo Alexa photos on Internet”

 

 

 

 

 

 

 

 

 

 

So how did computers became so clever at interpreting language and tackling the problems of human language? More importantly, will the problem of communication accross languages then a thing of the past?

How did machine learning became so clever in language technologies?

For many years, the basic idea was to code rules into software. “IF” this happend, “THEN” something else had to happen. In translation, this meant building a list of grammar rules that analyzed the source language better of worse, and another set of grammar rules for reproducing the meaning in the target language. We will know that the initial optimism in the 1950s led to the now infamous (and flawed) ALPAC Report which put a stop to research in machine translation for decades. Human-language technologies disappeared from the research scene for decades until the availability of bilingual data sets renewed the interest on it (academically) in the 1990′s – but now within the realm of pattern-recognition and machine learning.

Nowadays, most automated translation systems are based on some form or another of statistics. Practically all incorporate self-learning or updating routines (what we once called DIY MT) so that translation engines can improve gradually with new data.

In the case of speech recognition (a very related science, also heavily dependant on pattern recognition), the software is fed sound files on the one hand. By matching these to approved, human-written transcriptions on the other, a pattern of equivalences is established. The system learns thus to “predict” with a high degree of probability which sound should result in a particular.

This is not remote from machine translation and hence the ultimate goal of the “speech-to-speech” translator. Machine Translation or Automated Translation, as it is known lately, gathers parallel bitexts that have been previously translated by humans. Algorithms detect the frequency of some string of words (or n-grams) between the two languages. The closer the languages, the better the resulting translation, as less re-ordering will be required. There are other features to improve the final output, obviously. Having large monolingual samples to create a “language model” will help smooth out the final result to a certain degree as the systems’ statistical guesswork is narrowed down noticeably. However, the last ten years of SMT have produced enough engines, metrics and examples to know that languages that are not even remotely related grammatically or that are highly inflected (Slavic languages like Russian or Polish, Baltic languages) perform worse with pure statistics as cases impact statistics negatively. Some companies have been able to “beat Google” as the benchmark and mother of all machine translation services. PangeaMT beat Google in English-Korean, for example.

The statistical approach has been made possible thanks to the huge improvements in computing. Most of us carry a smartphone in our pockets and that means several times over all the technology our grandparents ever handled in the whole of their life time. The huge availability of data that makes customization possible at huge speeds is another factor. Lastly, the latest buzzword: deep learning. Google introduced neural machine translation in 9 language pairs in November 2016, following a tested improvement in translations from Chinese to English. The nine languages are a mixed bag of “easy transfer” (English, Spanish, Portuguese, French) somewhat difficult (German, Chinese), and known hard nuts to crack like Japanese, Korean and Turkish.

What is deep learning?

Answering this question would require a full article. In short, let’s imagine that you lean over the window and look at the sky. It is cloudy, grey. It is likely to rain and although you have the tickets, only live 5 minutes from public transport and your best friend has said she’ll come with you to the concert, you decide you won’t go. This decision has taken you milliseconds. For you, the possibility of getting wet on the way to or from the concert weighs over all other factors (your friend coming with you, having tickers, using only public transport downtown). You use powerful neurons to think and take a fairly logical decision: “I have already spent money on tickets, my friend is coming and the concert is just a short ride away, but I hate the rain, and I don’t fancy getting wet at all in the middle of winter”. What seems so logical to a human, isn’t necessarily so to an algorithm (let’s call it machine). Teaching it to take several factors into consideration, and adding an element of uncertainty by weighing some over others sometimes is close to what users see as speech recognition or machine translation magic.

A neural system uses several layers of digital “neurons” and connections (i.e a digital neural network) between them which might ressemble the image we have of our own neurons at work. They require a lot of GPU (not CPU) computing power and are becoming extremely good at learning from the examples we, humans, provide. The disadvantage compared to purely statistical or hybrid systems are the huge computing power required to train and retrain the models. This cannot happen at most company servers. Therefore, Echo and Google Home use their software as a catching tool and calcualtions are performed at their own centers. Data is transferred to their system and, potentially, the way the user uses the sytem. The trade-off is that the system improves with usage and it becomes more and more reliable and useful.

The implications with Big Brother watching and privacy are clear. However, if the use of online email, social media and smartphones is anything to go by, Amazon and Google are going to help us  a lot and become more intrusive and yet indispensable in our daily lives very soon.


Reference

  • Can amazon echo beat google in the long run: http://www.forbes.com/sites/quora/2016/10/24/can-amazon-echo-beat-google-in-the-long-run/
  • 8 things Alexa can do that Google’s Assistant can’t: https://www.cnet.com/how-to/things-alexa-can-do-that-google-home-cant/
  • 9 things Google Home can do that Alexa can’t: https://www.cnet.com/how-to/amazon-echo-alexa-vs-google-home/
Maxim Khalikov from Booking.com

Some takeaways from TAUS Summit Portland

TAUS Yearly Summit in Portland was a great event and the largest I have attended so far (and I have been a regular attendee since 2007 in Brussels). The organization has definitely grown from being considered a think-tank to promote the exchange of data for the benefit of automatic translation engine training, to develop useful tools for the industry. There were times when only experts and a few EU officials or managers from large corporations attended. The mixture in the audience and the quality of the keynotes prove that TAUS has grown as a major reference conference for decision makers and translation technology implementers in the language industry far away from service LSP’s conferences.

We are going to be postediting and leaving the TM syndrome behind. Translators will need to face the reality and the realm is post-editing – Tony O’Dowd, CEO, KantanMT.

Unfortunately, I missed the first day of the conference due to flight connection delay. I can only report on the interest raised by a very interesting tool by Spanish company Prompsit which aligns bilingual datasets from websites.

On the second day, Jaap reviewed the history of TAUS and the increasing questions on whether there will be a need for translators in 20 years time given the speed of technological progress in the language industry. There is an open concern among translators indeed about the future of their jobs and skills and VC are spending profusely in financing crowdsourcing and post-editing companies. The future of a typical language service business may be at risk in 20 years time, who knows. However, what is clear is that savvy translation companies are becoming something else nowadays. They are focusing on language delivery, content service, website localization and machine translation is playing a big part on that.

Keynote note by Margaret Ann Downling

Margaret is not a translation industry specialist, but she has been a buyer of serious translation services for many years. Her opening lines drew the attention to the “skilled workforce” syndrome: 30 years ago Ford was the largest employer in the US paying $35/hr. In 2016, a sign of the times is that the largest employer is Walmart, paying $9/hr without any benefits.

Publishing industry has been suffering for many years and sees translation and fast publishing as a way out, tapping new markets. BUT! costs are too high for large magazine. There is a growing divide about things being learnt at school and what the market and companies demand. We are teaching kids things and we are teaching them in a way that is no longer valid in an almost completely digital society.

Content is fast, consumed in minutes and seconds, it has to be clickable, SEO-friendly like how to learn a language in 2 weeks, Protecting our planet starts with YOUThese are the superheroes of digital content. This is leading to generation C, tired, sick and bored, all those bored after 9/11. They are just switching off from the massive amounts of data, news and messages bombarding them every day.

But for companies, organizations and institutions, content strategy cannot be undone: once you have taken a decision is like breaking an egg, it can be fried or scrambled but never go back to its shape. the same happens with translation and machine translate without any thought: automation yes, but to a degree. You cannot rely on automatic translation to fast translate all your content blindly. For Margaret, editors are the key, they make sense of the world around though their curation and creation. They are the architects of our society.

Audiences are your board members. Engagement is the key. Transparency is key and success is measured by authenticity. It cannot be machined. “Content is to the mind what healthy food is to the body”. If we get 100% accuracy, we solve a lot of communication problems but a publishing company deals with tones and with people. By pairing up great editors and great translators, great content is created. An interesting final remark by Margaret, who claimed not to be a technology expert when it came to translation: “The whole world of translation matches and translation fuzzies just does not apply to my world of magazine and quality content, quality information publishing”.

To Cloud or not to Cloud – How cloud are you?
In this session, most respondents (78%) said that their localization tools and data were kept in some kind of cloud or mixture, which seems amazingly high for an industry where about 92.1% of industry operators are sole proprietors and 66.3% of workers are self-employed. Private clouds reign although something similar has been around in the shape of VPN and other kinds for many years. The difference is that current clouds can be extended and are more flexible than in the past.

Maxim Khalikov from Booking.com

Maxim Khalikov from Booking.com on Machine Translation Application

The discussion topics ranged from the benefits of the cloud and if teams were working at the “Speed of the Cloud”, the lack of automation and above all integration between CMS and TMS systems which is very slow. Jack Welde from Smartling chaired a session dealing with the interesting topic of “Tools – Are you better off building them or buying off-the-shelf products?“. In general, people tend to build what is core for them, but that is expensive and for many companies a TMS is not core, so they buy from outsourcing vendors like SDL WorldServer (may I add our own ActivaTM can keep enormous language databases and, being based on ElasticSearch, is API-ready for all connections). But in the case of SDL WorldServer, each box is $40k for a server, but many companies are not there to program TMS. Machine Translation definitely is not seen as strategic or something to own by most companies. They wouldn’t know where to start. That’s why there are so many programmers building more or less the same tools for different language companies. Many can’t find what they are looking for off the shelf, and thus, most build because they have to.

Jack went on to mention that most translation buyers see translation as a cost center rather than an opportunity, a tool for revenue generation. Spending money in translation needs to be justified, whereas it should be seen as an investment for marketing and sales opportunities. As growth is coming from international streams, companies are investing more in translating content or at least thinking of it as revenue center and not just a cost center.

As translation companies are feeling the pressure to keep up with technological trends and new financial models, the translation profession is [slowly] adapting to to new needs and new scenarios. It is unfair to talk about a translation solution or even a type of translation company, just as it is unfair to put all translators in the same bag. This became very clear in the recent TAUS Summit in Portland, where 10 disruptive innovators from without the industry and 9 inside innovators presented their lines of research. From multilingual translation platforms including voice-to-text commands for immediate translation (aka almost a multilingual secretarial service) to neural machine translation, linking in to big data, karaoke-style subtitling or new ways of improving translation memory recalls by running a TM that is separate from CAT tools – attendants saw the best of what is to come in the language industry.

map of cameroon showing ngong language with 2 speakers

Why do languages die?

If you ask people in the street how many languages they think there are in the world, answers will vary. A joke says that a random sampling of New Yorkers resulted in inspiring answers such as “probably several hundred.” Clearly, this is quite far away from what we know today.  Funny enough, estimates have escalated over time. In 1911, the Encyclopedia Britannica implied a figure somewhere around 1,000 languages around the world at the time. That number escalated during the 20th century. During the course of time, 7,097 distinct languages have been catalogued according to the most extensive catalog of the world’s languages by Ethnologue (published by SIL International), generally accepted to be the authority in the field. Of course the number of languages has not multiplied in 100 years. In fact, languages die. It is our understanding of what a language is and how many languages are actually spoken in areas that previously had not been researched. Although it is hard to know any of this precisely, Ethnologue estimates that 34% of those 7,087 are ‘in danger’ or ‘dying languages’. Religious and missionary organizations have been historically responsible for a lot of the pioneering work undertaken in documenting the world’s languages (SIL International has an interest in translating the Christian Bible, which as of 2009, had been translated into 2,508 different languages (at least a portion of it if not the entire work), still a long way short of full coverage.

 

Endangered languages?

There are several organizations whose mission is to document and collect data on the world’s languages, from dormant to endangered. UNESCO says that half of the world’s languages will disappear by 2100. That means over 3000 languages wiped out from the face of the Earth. A quick look at the work of organizations like The Endangered Languages Project shows that in countries like Spain there are at least 6 languages which may disappear soon as a result of cultural colonialism (and they are not Basque, Catalan or Galician), just like in other places it will be globalization, war or climate change. The Foundation for Endangered Languages does not provide a more optimistic outlook. Whilst “world” languages such as English, Spanish, Mandarin Chinese, etc., are becoming increasingly valuable, small tribal languages become endangered. Minority languages in Europe do not fare better. Out of its 287 languages, 52 are categorized as dying and a further 50 are in danger. The Times of India reports that India has lost 20% of its languages since 1961.

Why do languages become extinct?

Colonialism

Many of today’s endangered languages are tribal. This means that they are spoken by a small group of people who have been colonized by a large, sometimes foreign, invader. Historically, colonial powers such as the United Kingdom, France, Spain and Portugal have made their languages dominant in the Americas and Asia. But let’s look at Northern Cameroon, where Ngong is close to extinction with only two speakers. The menace comes from other near populations, not from colonial French. map of cameroon showing ngong language with 2 speakers Regions that have lived under foreign rule have some of the highest rates of endangered languages. For instance, although Greenland has a national language Kalaallisut (Greenlandic) spoken by practically all 50,000 inhabitants, most are bilingual and speak Danish as well since Danish is the language considered important for social mobility. The same can be said about the effects of Spanish on Catalan, Galician and Basque in Spain.

War and Conquest

Sometimes a language dies tragically. Wars and invasions, genocide or displacement of people can cause both the death of a language and also the culture of the people that spoke it. Human history is full of such examples, from the times of the Roman Empire displacing Celtic peoples and languages in Europe, Iberians in Spain to colonial Britain and native Americans and the aboriginal population of Tasmania.

Global Warming

Climate change will soon have a devastating effect on some of the world’s smallest languages, particularly those spoken in the Pacific Ocean. The film There Once was an Island powerfully documents the growing struggle of the inhabitants of Takuu, a small island fighting for survival in the South-West Pacific. The 400 inhabitants in the island speak Takuu, their own language, a member of the Polynesian family. If they are relocated to Papua New Guinea, Takuu will be absorbed and die.

 

Globalization

Globalization, the Internet and constant interconnectedness has changed our planet forever.For good and for bad. It has offered the best and the worst in many cases. It has opened access to education but also to higher levels of intolerance, to knowledge and global brotherhood and also to the rise terrorism and nationalism.

It’s understandable that one would use an international language when communicating across the world. English played that role initially but Asian languages, Spanish, Japanese, Russian, Brazilian Portuguese and Arabic have been taking a larger share of Internet content over the past few years. Obviously, learning and speaking any of these languages should benefit your job prospects and relationships with other people. Even fairly healthy languages that exist today can face extinction due to the effects of globalization. Great efforts are made in Europe to preserve national identities, but regional languages such as Breton, Alsacian, Catalan, Basque or Occitan in France have little chance of surviving a few more decades if there is no government support as in the Irish case (English is not the official language of the Republic of Ireland but Gaelic). We have already mentioned the 200 languages that have died in India since 1961, mostly as a consequence of the pressure to assimilate. This is largely the case in many communities in the world where parents choose to speak a different language to their children because they feel that to get on in life, study or work, their children will need to speak a “higher language”

Ethnologue estimates that 34% of those 7,087 are ‘in danger’ or ‘dying languages’.Manuel Herranz

3 reasons for a multilingual Joomla, WordPress or Drupal website and 7 things you should not do

by Manuel Herranz and Alex Helle

If you are one of those people who believe that operating in English (or your national language and English as the default international language) suffices to talk to the rest of the world… we regret to inform you that there is a huge misconception in the way you approach the global marketplace. There are powerful reasons to have a multilingual Joomla, WordPress or Drupal website and I would like to help you understand why.

A few months ago we reported in this blog on a study by the European Union that pointed to the fact that 90% of users preferred to visit websites in their own language. The survey, conducted by Gallup, found that Internet users in 23 EU countries prefer browsing and making purchasing decisions in their native languages. You can visit the link above to download the full PDF, but if you do not have the time for all the reading here is a summary:

  • 80% of Internet users used the Internet daily in practically all EU countries (EU average: from 73% in Italy to 90% in Slovenia) and the trend was growing
  • Nine out of 10 Internet users said that, if given a choice of languages in a website, they would always visit a website in their mother tongue.
  • Nearly 20% of Europeans never browsed websites in any other language but their mother tongue.
  • 42% said they never purchased products and services in a different language.

And all this happens in a continent that is well known for enjoying a high level of multilingualism: 19% of Europeans speak two languages, 25% are trilingual and 10% of Europeans speak four or more languages. The 46% of the monolingual population is now officially a minority. 98% of Europeans think that learning at least one foreign language is important for the future of their children. But although so many Europeans are well-known bilinguals, trilinguals or multilinguals, their preference is still to buy in their native language. Now imagine what this preference might be in large monolingual countries or economies or with a heavy state-sponsored national language such as China, Brazil, Arabic-speaking countries, Japan and Spanish-speaking Latin American countries. Making a purchasing decision in English may just not be an option.

happy multi-ethnic Brazilian group of people standing in front of brazil flagbusiness corporate people japanese ethnicity meetingrussian girls in folklore chelyabinsk festival
Clearly, there are many reasons to make your website multilingual. As the European study shows, Internet users that come to your website as global customers will not stay, read, browse much less buy if the website is not in their native language. And using Google Translate is not an option.

1. Multilingual Sites Get More Visits… Engagement and Conversions

We dealt with the strategy to follow (multi-site or multilingual) in a previous post. This will affect the way you design and run your international strategy. There certainly pros and cons to both options, so please read “3 tips translating a website and website localization“. Nevertheless, the main point is that offering multilingual content is increasingly more important than ever. Newspapers such as Spain’s ElPais.com have editions in Brazilian Portuguese and Catalan. Marketing managers are not after the sale, but building brand loyalty. And brand loyalty is based on relationships. So you cannot be coy about translation if you are seriously targeting users in a specific area. Users and visitors must feel engaged emotionally with your brand. Otherwise, forget about engagement and conversions.

2. Multilingual Websites Greatly Improve SEO

Before Larry Page and Sergey Brin founded Google, Ilya Segalovich and Arkady Volozh (note the 3 Russians and 1 American behind the development of search engines) had already created Yandex, Russia’s largest search engine with 62% of the market. Yandex provides better results in Russian so while Google focused on counting links or weighing the PageRank of Western websites, Yandex’s ranking algorithm took a more semantic approach by calculating the distance between words and the relevance of documents to a searcher’s query.

Both search engines have been converging for a while, but Google just takes 27% of Russia’s search market. South Korea (Naver), China (Baidu) and Japan (Yahoo!) will scape Google’s reach for many years to come. This should also illustrate the fact that more than half of all Google searches are carried out in a language that is not English. Again, a multi-site or multilingual strategy will affect the way you run and add SEO-friendly content, but the end result is that the whole of your strategy will benefit by the growing number of keywords. You may choose to centralize everything in one site or offer “local flavors” with national domains (we run national sites such as pangeanic.jp, pangeanic.cn, pangeanic.rupangeanic.frpangeanic.de or pangeanic.es). Local SEO drives business.

3. Sell More Products and Services

Bottom line best reason for making your Drupal or WordPress site multilingual? Selling more products and services. The survey shows that 42% never purchase products and services in other languages. Localizing your site increases the number of site visits, which in turn drives more leads and sales. By increasing community engagement with your site’s multilingual capability, you’re going to create an environment that encourages more participation in your online communities. This will help you collect and nurture those important brand ambassadors who are key to boosting sales with referrals, product information, and positive reviews.

7 things you should never do to make websites multilingual

  1. As mentioned above: never ignore the language of your target market.
  2. Never use machine translation program as the representative of your corporate image.
  3. Some times you need more than language translation: you need localization or transcreation.
  4. Never make it difficult to find your other websites from your main site.
  5. Never forget about SEO.
  6. Websites are like plants: care about them regularly – Never build it and forget about the website.
  7. Don’t build a website if you cannot support customer queries.

We will deal with these 7 tips in later posts.

For more information about how Pangeanic Translations & Technologies can make your website translation easy, please visit our website translation services.

Sources:

2011 User language preferences online – Analytical report

2012 Eurobarometer Report “Europeans and their languages”

6 important points for brands writing content for international audiences

by Manuel Herranz
Writing content and distributing knowledge to international audiences can present a number of challenges.  The first one is for management to understand the value and ROI of multilingual content and translation into several languages. The second is for the brand itself (that is staff, from production to accounting) to believe they work for international markets. They need to be convinced that their salaries and the company’s revenue come from people that speak other languages and whose only affinity to them is the brand. Thirdly, traditional channels for the distribution of quality translations need to be complemented (or substituted) by the company’s website as a hub for multilingual knowledge, social media, etc.

But we might call those three points the fundamentals. They are prerequisites. I would like to deal with some other points that brands can often miss when writing content for international audiences. This is a short guide to help marketing and sales personnel, website masters be sure they don’t make any mistakes when they need to translate a website in multiple languages and distribute the brand’s content worldwide.

content marketing for international audiences

#1: Content preparation and planning

The original content is directed to a particular audience, normally the “home” audience. Spending a bit of time adapting the source material from the very outset, a brand will save time, money and resources in translation. A direct translation only works in the case of instruction manuals. Content needs to be adapted to different target markets, from place names to headline titles, and from really relevant content to expressions, currencies, measurements, even content that will not require translation, because it is simply not relevant as a Spanish translation, or for an Italian audience.

Avoid cultural references. They often do not have a good translation, at least one that conveys the full meaning and translators will have to look for alternatives. Even though the translators are fully professional, the relevance may also be lost for an international audience. For example, reference to “home runs” if you have written content for US audiences will not translate well into other languages. Expressions such as “taking the bull by the horns” or “going the Full Monty” (as a reference to the film), etc., may translate more or less well from English into some European languages that are culturally close within the context of a business environment, but they will mean nothing in China, Japan or Korea.

#2: Consolidate a writing style

Avoid sentences over 25 words long: short and clear sentences always read better. This does not mean they have to be simple. A wide vocabulary will improve SEO rankings as variety looks more natural for search engine algorithms. Short and clear sentences will also reflect in clear translations. Brevity is your ally. You will save in translation and localization or cultural adaptation.

Related Content – Learn more about multisite and multilingual sites for SEO:
3 Tips on translating a website and website localization

#3: Use editable files

This will help save everybody’s time when it comes to exporting material from a document and importing it back in. If you cannot provide editable files, or graphics have been vectorized, the process of localization will take more time and it will hit your pocket, too. The same happens when working with scanned PDFs or PDFs that cannot be edited. There are ways to solve these issues, true. But they are all workarounds to create another source file. Surely somebody in the content creation chain must have some original, editable files. This also underpins the requirement for careful planning above because we are using somebody’s time. Creating a company database or brand repository can avoid very serious headaches!

#4: Manage graphics with care

Sometimes, illustrations, graphics and images contain text. If they don’t, they will obviously not require editing for translation. If they have text, they may or may not require translation. Therefore, they should be classified in order to avoid a DTP operator or Project Manager checking each one of them. This will speed editing and localization.

Ideally, text inside an image should be kept to a minimum. A lot of text means a new layered source file and the translated text restored as a separate layer, which means more desktop publishing work, and more hours. Sometimes, localization in right-to-left languages means creating a brand new illustration file.

Lastly, a working tip: images should never be embedded in a publication – linking is a much better option to make files lighter.

#5: Terminology is important – Ignore it at your own risk

Translating high volumes of content into several languages means managing teams of translators. And even for small volumes, you may need several teams (Japanese translators, a team of French and German translators, etc.)

Availability in these teams may play a part in your publication quality, because nobody can guarantee the same translator will be available all the time. However, companies need to be consistent across all their multiple channels and day after day, month after month. This can only be achieved if the translation house or the brand’s managers work with an official glossary or terminology database. Even a simple Excel file will suffice – it is up to the translation firm to employ professional tools to make these terminology assets available to its translators. This can be done online or by supplying the terminology with the job, even pre-translating.

If content is king, terminology is queen, as they say. A well-managed terminology will also ensure the speedy delivery of every translation project and reduce the time proofreaders / reviewers take to validate the final version before release.

#6: Changes and additions are unavoidable – do not micromanage

We go back to content preparation again. A final document is one translation job, while multiple changes or edits mean starting and stopping the process and translators in every language. The translation company should have a system for tracking changes, but in the language industry this means human intervention. It means multiple emails and calls… even if for one word. Changes should be requested in batches or in stages to avoid working with multiple versions of a document.
Some translation and localization costs really cost a lot more than they should because of the excessive iterations.

Recommended Reading:
5 tips for a cultural adaptation of translations

Evolution of the language technology landscape – TAUS Tokyo

by Manuel Herranz

I attended the last TAUS meeting in Tokyo. This organization has come a long way in promoting machine translation among translation professionals, primarily translation buyers. Corporations like Microsoft, Adobe, Dell, eBay, etc., donated large bilingual data sets which allowed companies to improve the stage of machine translation, to run hundreds of tests with Moses in order to improve accuracy and find better ways in which to make machine translation a reality we find embedded and we take for granted in so many products.

Pangeanic’s drive to create and develop innovative language solutions for its clients led us to create a new section called PangeaMT, which was the first one to use Moses in a commercial setting back in 2009 and served its clients with language automation. Nowadays, it seems that widespread adoption in the wake of solutions provided by non-industry giants like Google and Microsoft have created a language solutions industry by plugging 3rd-party APIs. However, research in machine translation has come to halt due to lack of funding from institutions, although paradoxically adoption by language service professionals is far from general.

Evolution of the language technology landscape

Evolution of the language technology landscape – Jaap van der Meer

Jaap van der Meer provided a good overview of events and developments in the translation industry and language technology landscape over the last 30 years during the last TAUS Summit in Tokyo. I will summarize his review and some of the developments in language automation and machine translation during the TAUS Executive Forum Tokyo 2016, adding a few facts and spices of my own.

Language technology in the 80′s and 90′s

Starting in the 80’s, the advent of the computer meant that PCs helped translators do spellchecks and grammar checks. That alone, even at such a basic level, marked a change in the role of the translator and translation as a profession. We moved from the (IBM) typewriter to floppy disks. Software was king and the term “localization companies” began to replace “translation companies” as computing made it possible to automate translation processes. Perhaps this is a very Western view and developments took place at different times in different countries. Over these 20 years, the language technology industry saw the development of tools and concepts we still see today as computer assisted tools.

Language technology in the 21st century

Gaston Bastiaens, an ex-Philips executive and entrepreneur went to jail because of (partly) his promises of personal, wearable translator did not materialize, thus fiddling sales books and revenues. The “Star Trek translator” does exist nowadays, in different shapes – but is 15 years too late and in Japan. Mr Bastiaens’ company was a publicly traded company in Nasdaq and he had fiddled with numbers in order to make it credible that a “universal translator” was just 1 year away in a far advanced stage of development – close to commercialization. This recalls the stories of “high expectations that were not met” just as it happened with the ALPAC report in the 1950’s.
Gaston Bastiaens goes to Jail, May 2001From 2000, the globalization phase and connectivity take place. There is an unhealthy accumulation of translation memories that go to a server. We want all translators to connect and work in a synchronized mode whenever possible. There are new ideas about workflow automation because managing the process of translation can be as costly if not more expensive than the translation itself. Competition begins in the CAT tool landscape: Star TransiT, OpenTM2, WordFast. We find TM client server in projects like Euramis, memoQ and Advanced Leveraging from products like MultiCorpora or Déjà Vu.

Yes, it can take up to 42 steps to get a translation job done. That’s the reason why translation companies raised $250M in venture capital money. But translation companies also inflated expectations and, to follow the well-known Gartner Hype Cycle, innovators tend to integrate in larger translation companies.42 steps to get a translation job done

Machine Translation was back in 2007: “Let a thousand systems bloom”

Around 2010, translation becomes a strategic matter in enterprise agendas. It is the age of web services. Technology is able to build all types of APIs, there are webs-based TMs, TM and MT as a hybrid solution and MT becomes an enabler for other business. The value proposition of the language industry is technology integration.

For LSPs, this trend offers new avenues:

  • Diversification of services
  • Testing
  • Digital marketing
  • Consulting

GMS companies become “old technologies” and are absorbed by LSPs. It is time for companies like XTRF, Lionbridge, which acquires Clay Tablet; then SDL buys Idiom, etc., in a frenzy of technology integration and lock-up. After all, if you own the tools or the channels that make it easy for your clients to connect to your services, you are on your way to dominate the market.

By 2010, more words are translated by machines than by humans.

By 2010, more words are translated by machines than by humans. Enterprises begin to use different technologies but people at the bottom of the pyramid look for new technologies and tools. The 20th century was an “export mentality”: one translation that could fit all types of content and situations. We pick a market and we have to translate for that market; we create a project and we cascade it down the supplier chain , going from English into Japanese, English into Chinese, English into German, English into Spanish, etc. However, information is multidirectional in the 21st century. Quality must be differentiated. Sometimes one may need personalized information for a single organization, individual or company. Facebook has mastered this art and there are other types of content that are directed to people that are no longer worried about a small grammar mistake or typo. We do not choose the local because users come from a variety of situations and places and they may be familiar with some languages and choose to interface with our content in a language different from their mother tongue for different reasons. Content is also “borrowed”. Translation in the convergence eraToday, we must be happy if somebody “borrows” content from us! We are facing a translation streaming, translation is continuous, we are approaching a stage of collaborative translation. We need to work together but in a way in which cloud-based platforms make sense. Therefore, translation is multidirectional, not English into another languages, we also need to understand what people are saying. This has led us to enter the convergence era. Machine translation has become an API: you plug it to your system and you get translated. Machine translation is expected to happen.

With the exception of Germany “the cloud is not legal”, the move is irresistible.

So maybe we, translation professionals, are still thinking we are have a “luxury” or offer a “premium” service. But the truth is that most consider translation a right and it should be free, this is a 6 billion user market that expects translation processionals to pay for the infrastructure and technology development.

Convergence is a very broad term. It can mean many things: convergence of consumer, free and paying models. Suddenly, the language technology industry has become attractive. Google was the first innovator / invader in the translation field and has actually changed the way people view machine translation. Many startups have joined the language technology landscape, offering from translator productivity tools to translation apps, even “streetwise” translators like QR translator, extremely affordable speech-to-speech translation systems.

They are exciting times for the language translation industry and translation experts in general. But, what lies ahead of us? Does the future need translators? Chris Wendt from Microsoft Corporation attended the conference in Tokyo for the second year running. He had stated that “Transcreation and adaptation between cultures will remain necessary for one more generation, until the differences between the earth’s cultures have been reduced to minor deltas that machines can bridge. But I would not recommend my children choosing human translation as a profession.”

As “professional translators” and language specialists, we need to stop and think about the 6 billion potential users. How can we best help them?

 

 

Will there be CAT tools in 2020?

by Manuel Herranz
The speed at which computers and computing have evolved over the last 30 years has brought massive changes to many professions –and change for translation companies has not been an exception. However, the speed at which automation has hit and awaken our industry has often been “slow”. I mean automation not just in the general sense of machine translation, but most kinds of automations, including project management.  Adoption and progress have been painful, if we measure how clients often measure us: per capita productivity or how fast can you turn your translations around. The revolution brought about by the surge and increasing acceptance of machine translation in the last 5 years can only find a comparison in the “revolution” brought about by CAT tools in the 90’s. So, the question soon arises: Will there be CAT tools in 2020? Is a higher level of automation going to kill or transform radically how translators do their work? Will translators do their work as they did in 2000 or will they use interfaces, machine translation suggestions, managed terminology, touch pads? Will CAT tools in 2020 be free from keyboards? And what about voice recognition and machine translation?

No single product epitomizes change better than operating systems. Windows 8 DesktopThey are the framework upon which our CAT tools sit. Historically, Windows as been the winner for translation software. Only a handful of programs had working Apple version (Swordfish, etc). Does Windows 8, the new Microsoft operating system change the way translators work with computers? I have experienced first-hand translator resistance to move to newer versions of some CAT tools because it meant moving away from known interfaces into a new territory that (unlike their looking-alike iPhones, iPads and macs) this time looked too unfamiliar. I wonder is the new laptops, with their hybrid features and touch screens can bring anything new to the translation profession. Eye-tracking has been used to measure what translators and post-editors are doing, but I do not see the eye control as major breakthrough in our profession. And I do not see touch screens having a major impact in the way translators work. Will translation companies and translators be able to work on a Microsoft Surface? I doubt it. True, typing on the go via an interface can be useful in some situations and for some platforms and business models, but it is an innovation rather than a productive solution in a massive scale.

No translator worth the name in 2014 would send his/her CV without listing the number of CAT tools he/she uses in order to make work more efficient. Very few quote any skills in post-editing or machine translation, despite the massive amounts of money and funds invested in development. Why? Machine translation did in fact exist before translation memories came about in the 1990’s. They key difference is that while translation memories produced an easy-to-understand “saving scheme” by way of % discounts, machine translation products have yet to come close to producing a truly reliable confidence score which easily translates into a paying scheme. So, despite post-editing not taking much longer than working with a bad translation memory or working with 75% matches whose other “25%” has to be identified, checked, compared and translated to fit, translators shy away from machine translation.

I have been long enough in the translation industry to remember the initial outcries by translators who were not paid repetitions or had to face discounts because of translation matches. But eventually, translators did take up percentage discounts as a means of calculating productivity and payment. Some machine translation providers present their results in a traditional “translation memory” breakdown so project managers can calculate machine translation post-editing effort as yet another version of TM matching. In the hope to ease post-editing adoption ….so much for innovation.

Productivity per translator head has been quoted stuck at around 3,000 words on average for the majority of languages for many years. Many freelancers claimed higher production rates, but never disclosed how they could fit proper proof-reading and checking procedures in a working day. Dictation software has been a solution for some time, but I have only seen it applied in isolation by freelancers. So, what about disruptive innovations, like “working in the cloud” for translators?

What will CAT tools be like in 2020? Copyright: coramax / 123RF Photo archive

What will CAT tools be like in 2020? Copyright: coramax / 123RF Photo archive

David Canek from Memsource says: “Standalone CAT tools as we know them today will not exist anymore. They will become part of comprehensive translation platforms. These platforms will be cloud-based of course.” I can see where things are going. If some new translation companies, backed by venture capital, are making use of the “typing on the go”, then surely CAT tools in 2020 will have adapted to touch screens. But surely typing on a screen is slower than a keyboard. So some sort of cloud-based system including project management?

I also contacted István Lengyel from Kilgray, who said: “I think that translation will not be particularly touch-friendly, but project management (quoting, invoicing, etc.) can be multichannel. Dictation will remain to play a strong role, but it is already a reality today. I think that translators will be empowered with the management tools that companies have today. I also believe that there’ll be more linguistic knowledge included in the tools, but this is something we already said in 2005 :)” – uhm, that is an interesting insight. Most translators have not even considered adding voice (dictation) to their skills, but those who do make a difference. I do agree with the statement that freelancers and groups of freelancers are going to become more and more empowered with tools that only translation companies could afford a few years ago. So perhaps the change may come from CAT tools in 2020 having clever linguistic features, project management and CRM features to make them an “all-in-one” package.

Paul Filkin from SDL adds: “Given the conservative nature of users in our industry today I think CAT tools will continue to operate along the same basis they do today, but perhaps with more emphasis on post-editing capabilities and plugins to enable personalized machine translation in a more controlled and yet flexible way. There will be greater choice enabling a more modular approach to building your preferred translation environment to suit your needs, and at the same time I think we’ll see more integration of our current processes and workflows into the cloud and accessible on more devices for more types of users.” – Again, the trend seems to emerge towards cloud and more non-translation features, in this case together with higher customization (read empowerment) by translators and machine translation.

So, what do you think, will there be CAT tools in 2020 or will they be so different that they will not look like a translation memory system any longer? You can say YES/NO if you think they will be similar (YES) or unrecognizable (NO) in our poll.

Twitter, eBay, Facebook…Big data companies want to own machine translation

by Manuel Herranz
Companies creating and managing big data (big data very often means multilingual data, too), sooner or later realize it is in their interest to have direct access to machine translation technology rather depending on external technology or 3rd party plugins. Why? I’m not talking about owning a translation company but the fact that employing other companies’ machine translation technology signals you’ve given up technological  independence in a core business area. And this may be a core activity that generates income, traffic and visibility, depends on the wilĺ of another company. This can have serious consequences for your business and there are plenty of examples of companies having a bad time after doing so. The question is then, do Big Data companies want to own machine translation?

Ebay’s acquisition of Apptek made absolute sense. Ebay did not want to rely on 3rd parties for international business. Multilingual data, as generated by their users, is core to their business. In Korea, Samsung financed the acquisition of Systran via CSLI, until then a small MT player that provided them with Korean/Chinese/Japanese/English machine translation. Systran is the biggest machine translation company in the world and the acquisition provides Samsung fast and efficient access to a larger number of languages whose MT processing/development would have been too slow or costly.

Facebook has been using Bing Translator for some time, but also acquired another European machine translation start-up. Languages and multilingual data mean business. Other machine translation companies can brace themselves to be in someone’s shopping list if they can prove enough solid technology. Who will be next?

Let us take what has happened between Twitter and Bing Translator from Microsoft as an example. News came this week that Twitter quietly stopped offering users the ability to instantly translate tweets using Bing’s machine translation feature. One year ago, the company started using Microsoft’s technology, a general online translator. Tweets are particularly difficult to translate as they often contain abbreviations to make messages fit in 144 characters.

Thus, users who had used the automated service began noticing the absence of the machine translation feature earlier this week, though Twitter has not specified when it stopped offering the service, nor the reasons why it took this decision.

Users who want to get tweets translated from a foreign language will need to “go back to the past” and copy&paste the tweets into their own online or offline translation service.

Perhaps this will not be much of a problem for monolingual users. But even them may want to find out what a foreign soccer star, singer, or basketball player has tweeted. And because the nature of the tweets and character restrictions, Bing Translator  provided translations that tended to range from slightly flawed to incomprehensible. Microsoft did not customize MT engines nor did any particular work for Twitter. It is clear that some users will miss the translation capabilities. However, there  has not been a massive outcry from around the web, pointing to the fact that machine translation as knowledge gathering is still more useful than for direct communication.

Nevertheless, we are facing an interesting move by Twitter, because Yelp added Bing Translator to its iPhone app this same week in order to provide translations of reviews. Therefore, it could be entirely possible that Twitter has decided to drop Bing Translator and grow their in-house solution as EBay once did. Maybe they want to evaluate a different other machine translation product.

But for now, only one thing is clear: tweets will not be translated until further notice and we are still wondering why so many buys… whether big data companies want to own machine translation

Find out more about fully customizable Machine Translation  environments at www.pangeamt.com/en

3 types of machine translation

If you are a content manager and own a business, you know how much time writing takes. For instance, you can order one of your writers to write good content for your blog, for example a content-rich article of about 1,000 words. It soon adds to 10,000 words of valuable content that you need to transfer from one language to another. How fast can you expect the work to be done to get Spanish Translations, French Translations or German Translations of those 10,000 words? Well, it can take some days or weeks to get it done, depending on whether you use freelancers or a professional translation company. Anx this is where modern technology steps in. If you need to produce volumes of work in several languages within a limited time span and with a tight budget, machine translation is what you need. With this technology you will save on to two important aspects: money and time, but its deployment must be well planned.

The next few articles in our blog will deal with the importance of planning well your machine translation strategy and incorporating a machine translation workflow and use in your organization, whether you are a translation company or a translation buyer. We will use our experience at Pangeanic as the first LSP in the world to deploy Moses commercially and how we grew from there to create PangeaMT and serve custom engines and full machine translation systems.

If you need to meet strict deadlines and need your work translated quick, human translation services might take up time than expected. Rushing human translators is bound to produce mistakes. Squaring the cost/time/quality triangle and building scalable translation strategies is something that few companies have achieved in international publication. But when it comes to machine translation, and acknowledging that you will get some comprehension errors, you save time. And in language pairs for which it is very difficult to find a translator (imagine translating Japanese into Turkish as some of our clients requested recently), machine translation is the only option when speed is essential. Machine translation is a tool to speed translators’ output so they produce more. Popular online translators have made this possible. However, in real-life scenarios, many clients require special formats, very particular expressions and terminology adherence that generalist engines cannot offer. There are many gains in machine translation, but the main benefits always come from building specific and custom engines using the client’s previously translated material and terminology.

Gone are the days in which only large corporations could afford buying machine translation engines. Pangeanic, via its machine translation division PangeaMT has offered custom-built MT engines for years to companies and to other Language Service Providers, providing them with a key competitive edge and allowing for large projects to be completed on time, fast and efficiently.

At Pangeanic, we speak about the 3 typical uses of types of machine translation we can encounter

  1. for gisting (simply understanding what something says, with little lifetime value and low expectations by the user). Here machine translation engines exist prior to human interaction
  2. for publication (for serious publication work with a higher lifetime value for the document and high quality expectations by the user). Humans are in control of the input with which the engines have been trained and these perform according to their specific needs and domains. Here, machine translation engines are created after human users have decided that it is viable to use MT and they use it for a purpose.
  3. for human interaction (when humans do not speak each other’s language and a voice recognition software converts speech to text which is then machine translated and converted again into speech).
3 types of machine translation: understanding, publication, human interaction

3 types of machine translation: understanding, publication, human interaction

 

Everyone has used free online translation systems. The users’ approach to them is that it should be instantaneous, fast and free. And it should cover as many language areas as possible. In other words, it should be like a sheet: good length but not too much depth. Lower quality or unreliable outputs are acceptable as the service is free.

The second case is what concerns translation professionals and it is the use of custom-built machine translation engines for a specific purpose. Typically, translation professionals will pay for this service as a professional service and tool with its own ROI as it will lead to higher outputs by professional translators who save time in typing, reading and understanding and sometimes looking up terminology. A well-built translation engine will contain specific terminology that will save invaluable time to post-editors even if they do need to improve the sentence to make it flow and sound human. Post-edited material, constantly evolving techniques in natural language processing, hybridation, etc. This is the area where machine translation has made the highest impact in professional and quality publication, as an aid and tool for translators’ to use.

A spin-off case of the above is the use of customized MT with an API to translate web content on the fly, for example making calls from your content management system to a custom-built engine that can serve fast translation of products, short reviews, etc. Opening this type of access to machine translation can open new revenue streams to companies as they can add new services to their clients with the right technological partner.

As explained above, the third application of machine translation engines is human to human when people do not speak each other’s language. Some claims have been made about speech-to-speech translation lately, but mostly in controlled environments. Let us remember that although speech recognition has advanced a lot, there is a training time required for the software to recognize one’s tone and some accents are better recognized than others. Without prior training, speech recognition can fail. This is a loss to which we have to apply machine translation and convert text to speech again.

Stay tune to our blog to find out more about Pangeanic’s applied machine translation strategies and how our technology has provided success stories to both larger translation companies, organizations and companies in a variety of sectors.