Author Archives: Manuel Herranz

About Manuel Herranz

Manuel works at Pangeanic, the translation company that provides translation solutions. Pangeanic was the first translation company to successfully implement machine translation system Moses as it was released from academia in a commercial environment (AMTA 2010, Euromatrixplus.net).Pangeanic is a translation company and developer of language technologies, with a focus on language automation, machine translation, translation APIs, and web technologies for website translations, technical translations and multilingual desktop publishing.

TAUS Tokyo Executive Forum 2014 – Machine Translation becomes embedded

by Manuel Herranz

Despite years of economic stagnation, a feeling we are so familiar with in Europe, Japan proved that many good things can be expected from it at the latest TAUS Summit when it comes to innovation and application of machine translation as an embedded application in services and technologies.

However, the first striking news came from Korea. CSLi acquisition of Systran had surprised many (I’m no exception), but the presentation at TAUS explained many of the unknowns. It also provided a hindsight as to what the route map may be for the future of machine translation as a traction force in the translation industry. CSLi is Samsung’s machine translation provider of their famous S-Translator app. Their acquisition of a Western expert with vast experience in European languages has opened a lot more language pairs and expertise to Samsung. This, in turn, provides massive amounts of users’ search and language data to the corporation.

Hunnect’s experiences with engine machine translation without big data were an explanation for hands-on applications. Mr Sándor Sojnóczky classed “little” at 8M words within the human science domain. He was able to customize some engines and build on them and obtain real improvements by separating the material on 3 levels. His life sciences engine was based on a first level of general Life Science corpora, a second level based on Medical devices and Clinical Devices and a third level which was specific to a product. Despite the success, (post-editor producing around 900 words an hour) a general impression from this non-developer is that MT companies hardly provide the world-class customer care service other types of companies provide. In one word, his machine translation vendor got his money and his ideas and quickly moved on. (Users of PangeaMT presented optimum results with a single software engine at 5 million words at past TAUS events and Localization World, following the launch of our User Empowerment in Barcelona 2010, but we will refrain from self-promotion).

Sándor Sojnóczky from Hunnect

Growth at printing companies via machine translation? That is the title of Masanobu Ogata’s presentation from Toppan Printing Co Ltd. Their plan is to offer their translation system to Japanese companies expanding globally for free or at a very low price. The focus is low-cost operations to translate manga, novels, how-to books and other printed Japanese content, digitize it and sell it through Booklive. They will use it to reduce their in-house localization work and make it more efficient. If it is free, and it could become the standard system in Japan. Right now the system fills the demand for Asian languages, with business translation orders system to be launched in late 2014.

However, apart from in-industry news and developments, the limelight was cast on two applications that are making translation a utility. One came from NTT Docomo, introducing a kind of Google glass device and a menu translator which can magically return translations over pictures taken with one’s mobile phone. I got news only two days ago that google had bought a start up to do exactly that, offering driving signal translation as a use case.

The other breaking application came from Mark Seligman at Spoken Translation. Mark introduced a live translator for the medical sector which can understand and translate sentences within domains for certain language pairs, running a live transmission over the internet with one of his associates into German.

Our presentation on PangeaMT as the ultimate User Empowerment platform with which to experiment, and above learn and grow your company’s machine translation strategy was well received and understood, with plenty of Q&A. Buyers of machine translation technology are getting wiser and wiser. They do not want to become passive users of lonely engines with some nice statistics thrown at them.

IMG_0579*Explaining the advantage of technological independence rather than becoming an “engine buyer-user”.
Increasing, the ability to grow one’s system, clean, know the best of Moses and tweak all options to put maximum customization in the hands of users is becoming more popular, although some players like to mix concepts about what DIY and User Empowerment is (for self-promotion) at their presentations at industry event presentations.

Dion at Gala showing hamburguers Dion at Gala showing instant soups

The conference continued with Jaap explaining TAUS roadmap for the Human Language Project, a long-term driving force like the Human Genome project in order to disentangle languages. With the idea of MT becoming the Lingua Franca, the Data Repository with its attractive matrix of languages is an attractive feature to any machine translation enthusiast. Other work includes the quality metrics and studies on finding things like annotated data, and a program called FT2MT which would include automatic selection for optimal model combination, a shift from translation data to library of models, and a strong accent on evaluation which must be automated as human evaluation is too costly and lengthy.

Finally, it was NTT Docomo’s Menu translator the application which won TAUS Innovation price. Plenty of things can be expected from Japan again

ntt docomo receive prize

Tekom and EXPERT Hybrid MT Winter School

by Manuel Herranz
Pangeanic attended two major events during November, promoting its flexible machine translation technologies to translation experts/LSPs and corporate users. We also took part in the first training event of the EU’s Marie Curie EXPERT program which aims at training young researchers in hybrid machine translation technologies and link up experienced researchers with industry.

Traditional Japanese music Tekom

Traditional Japanese music at Tekom

TAUS' Razheb Choudhury - Manuel (Pangeanic) - Diego Bartolome (TAUYOU)

TAUS’ Razheb Choudhury – Manuel Herranz (Pangeanic) – Diego Bartolome (Tauyou)

Tekom concentrated pretty much on terminology management this year. With machine translation now being mainstream in most LSPs, or being adopted to some degree, we were happy to see that worth of a mouth as a result of practical use of PangeaMT by adopters is the best publicity for a technology developed free-of-chains. Despite typical noise by some developers, more and more LSPs are becoming increasingly interested in owning a technology which empowers them to grow and become technologically savvy, as well as enabling them to design better solutions to their clients, without being technological dependent.

Lucia Specia - Post-Editing

Lucia Specia – Post-Editing Presentation

Pangeanic’s technology has been made available as a powerful platform to language service providers and corporations for some years. This has led the company to become part of national research projects and currently a hosting organization where young and experienced researchers will learn and develop novel hybridation skills in the field. The first event within the EXPERT project was held in Birmingham and organized by Wolverhampton University.  The project brings academia and industry together, and aims at training the next generation of researchers at leading European institutions, research centres and technology companies. Apart from being an accolade to work already done, industrial partners such as Pangeanic will be able to expand machine translation capabilities, language combinations and new hybridation and combination techniques in several areas, making the best of computer-assisted translation and machine translation technologies (example-based, statistical and hybrid approaches) as well as including input from respected figures in post-editing research.

Four Steps to Understanding DIY Machine Translation Customization

by Manuel Herranz
There has been some recent controversy in LinkedIn and blogs about claims to higher technical levels of engine customization, what is machine translation engine customization, DIY Machine Translation Customization, and how some people understand it.

PangeaMT specializes in custom-built systems which users (typically LSPs and translation language departments) can later re-train in two different ways

1. on-site if they have a full system installation (when data privacy is an issue)
2. using our own servers, in SaaS and via our API.

K. Vashee states that “The reality is that running an open source MT solution or using a “upload and pray” solution like that of many DIY MT vendors has become very easy.” This is a gross misunderstanding of what DIY MT is. DIY is about empowering the MT user to take control of the system or at least part of the process, rather than being a passive receiver of MT output that has to be quickly post-edited.

Building an MT engine has become pretty popular (that’s different from easy) and widespread in 2013. Systems are getting better as more and more data is available. Yet, data is not everything. One of our largest engines at PangeaMT holds more than 190 million words, and other engines contain five or six TMX files with over 300Mb of text data inside each. Some little engines with under 5M words perform very well for the documentation task they have been built (see our common presentation with Sybase at Localization World 2011 below).

I do not know any MT system builders who claim that using unclean data will not affect the output. Or that leave such freedom to untrained MT system users, without training. That is a key differentiator for PangeaMT: we train users so they can have an impact on how their MT will evolve and develop. Initial revision of (at least) part of the material or typical chunks of text within the domain is the first step to MT engine customization. I summarize some key steps for a good DIY SMT implementation, whether on-site or off-site (SaaS):

1. Gather relevant, in-domain material.
Your own material is key for the best engine performance. The material you have translated in the past is likely to be similar to the material you will translate in the future. Those expressions, terminology lists, translation memories, HTML files, parallel data, even monolingual texts, will form the basis of your customized engine.
However, there may be times when you cannot share all your data. This is the advantage of PangeaMT. Do not despair. Any general, related data will serve purpose for the engine set up. We will train you and show you potential pitfalls with training sets and cleaning.

2. Ask your vendor to analyze the data provided and run cleaning procedures. Your MT vendor should be transparent about “dirty data”, segments discarded and present an analysis of the troublesome segments or datasets which should not be used for machine learning. Dirty data does not mean “bad translation” but very often “noise” that has been introduced by the translation management tool itself, rendering a segment unusuable for machine learning. Explaining rather than translating, or offering bilingual versions will of course confuse learning patterns. So will adding – ” “, ; : profusely when they should not be there, or bad alignments. Source same as target

Data cleaning is a key step in the system. We recommend deleting segments rather than trying to “repair” them. Most of the time, it is not worth the time – unless your data is really dirty.

A lot of cleaning can be done prior to the material entering the system (see below).

Untranslated "to" would affect machine translation learningUntranslated “to” would affect machine translation learning

There are more complicated “cleaning” routines which fall outside the scope of this article and involve revising alignments in phrase tables. We will leave that for keen system users.

3. Perform initial tests (first engines) together with your vendor.
Your vendor may do this and just present your with the final “good” engine or with a variety of engines depending on your specialization.  A habitual training method is to separate 2,000 segments from the training material and then ask the engine to translate those segments, thus obtaining a BLEU score (i.e a measure of how good the system thinks it is). However, this is not the only way nor the most efficient and % BLEU scores cannot be compared across languages nor even within the same language for different domains. An engine providing a 55% BLEU is no good when asked to translate out-of-purpose material, whereas PangeaMT systems have been reported to provide productivity increases from 50% – 300% in German with small engines scoring 38% BLEU but built for very specific purposes like software documentation or automotive manuals.

Put the engine to test with previous translations you have not provided or similar material.

4. Learn about engine re-training and the impact of post-edited material.
How big is your engine? How many words does it contain? What is the BLEU score/Meteor, etc? How many words do I need to retrain my engine? Does my vendor ask for 5%, 10% of the engine size or does it promise on-the-fly re-training with jsut one sentence? Even though that sounds pretty good, a 20-word sentence will have little impact on any engine, particularly considering that the “small” MT engines may contain 5 million words.

We recommend a route whereby your post-edited material can enter the re-training cycle at any time, and a system where you are in control of both cleaning and re-training. PangeaMT offers both. You can upload new material any time after you complete a translation or finish a post-editing job. The latter is extremely good material and several papers point to benefits of post-edited material in MT engines. You can also schedule or set immediate re-training.

PangeaMT engine control panel

PangeaMT engine control panel

Those four steps are basic checkpoints you should bear in mind when moving your  organization towards higher automation and adopting MT. Above all, you should also consider the cost of “ownership” or “SaaS” according to your needs and how far deep you want to go in MT. Do you wish to position yourself as an authority with fully customized machine translation technology in your language pair / field? PangeaMT will help you. Or do you simply wish to save time and translate faster, without changing tools? Our TMX workflow will help you.
Many tools are fully compatible with PangeaMT, and our philosophy is to engage with tool and platform providers to offer open standards solutions, no tie-ins. Our SDL plug-in allows you to work with a well-known tool and, simultaneously, benefit from being the owner of your own engines and use the translation memory to build, customize and re-train the engine(s) for the next jobs.  With PangeaMT, you will get an instant suggestion from your engine and choose whatever is more relevant, the translation memory match or the suggestion translated by the engine. Post-editing takes a few seconds, whereas translating sentences from scratch can take almost a minute sometimes.
Because every engine is built with your own material, it is specific to you only and trained to perform and translate in the fields you specialise and nothing else. Following strict TMX cleaning procedures and engine training methods, customized engines become extremely useful translation tools that aid translators in their every day tasks. Your future post-edited material can retrain the engine very fast, improving accuracy more and more with every job.
Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

         

Understanding Machine Translation Customization and DIY MT

by Manuel Herranz

The same mistake that was made by many translation agencies, translation companies and now language service providers is being made by tough machine translation companies. “My (machine) translation is better than yours”, “my machine translation system works, everybody else’s doesn’t”. Translation companies have learnt that they cannot sell translation services on “translation quality claims” only or “I am better than you because…”  – but it seems that some machine translation companies have to learn the same lesson. I am referring particularly to those with risky levels of investment /venture capital to repay and without the testing ground of in-house native speakers or a real translation department where to test their technologies and MT before release. At times, such companies obtained their “high quality clean data” by bombarding Google Translate and applying cleaning cycles which included manual revision by local, non-native graduates. Many LSPs fall for the big marketing campaigns, strong wordings – the limelight is always very attractive. Translation Memory technologies are a good proof of that.

Bad-mouthing the competition is the worst marketing tool I would recommend to anybody in sales, marketing or representing a company. Talk about your strengths. Acknowledge what you cannot do but what you can do to solve the problem. If you cannot match some offerings from the competition, saying it doesn’t work is a terrible policy. There are tens of use cases and applications, conferences, presentations to prove that, for example DIY MT works and is in good health, being used at LSPs, institutions and corporations. As far as I know, automated retraining and Moses packaging are part of at least two EU-funded programs. As platforms such as Gala provide an excellent platform for machine translation webminars, monopolistic attitudes become more and more aggressive.

But I want to minimize self-promotion. What Kirti Vashee seems to forget in his virulent blog entries is that no company will release a tool that doesn’t work nor install a product that cannot do what it claims it can. I was an industrial engineer for many years to learn at least the difference between what works and doesn’t work. When it comes to hardware tools, quality may be easy to spot. When it comes to services (and in machine translation is clear, “my output” “my clients” “my productivity” and “my technological independence”) quality is what works best for me. Claiming that in 2013 MT is so complex only one company fully understands it, is presumptuous to say the least.

Let me quote some translation agencies (the term Language Service Provider being unknown to the majority of people outside the language industry). They are not big companies, possibly what economists call small and medium-size companies.

Tilde, Apsic, Lexcelera, Pangeanic. I am sure other four at least could make it to this list. What do these companies have in common? All of them were/are  translation companies that have transformed themselves into higher solution providers either by developing software solutions that solved particular problems in translation or by customizing technology into their processes. With the help of EU funds and a clear vision to fill a market need, Tilde led R&D projects aimed at developing machine translation for less-resourced languages. Automated engine creation and re-training were part of the initial EU-funded project.

Apsic is the developer of one of the best consistency-checking software (XBench) which is a must of any company wanting to ensure terminology consistency and error-free supplies over hundreds of files.

Pangeanic has developed a management system on top of Moses which manages training sets and automatically cleans some data, trains engines and creates new engines with a variety of other customizable features.

As MT customizers, we know that initially some settings, parameters, weighs and features need to be configured carefully to get a good start. But I do not know of any company in the software business that insists on manual processes and cannot automate what it has to do repetitively.

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

         

How to build, run and own your machine translation ecosystem – Pangeanic at LocWorld London 2013

Pangeanic will exhibit its Pangea machine translation technology at LocWorld London 2013 inside ELIA‘s booth.  Over the 3-day event, you will have a chance to meet our representatives and see for yourself how PangeaMT works and how easy it is to create translation engines, manage them and update engines, clean and segregate training material and of course obtain translation in portable and open formats (from TMX and XLIFF to xml-compliant docx, odt, html and ttx). LW_logo.fhd Manuel Herranz is also a guest speaker at the pre-conference day, where he will speak as an experienced implementor of machine translation (MT) technologies at LSP and for large organizations with big publishing needs. Pangeanic was a founding member of TAUS, the industry think-tank and its spin-off the Data Association. Advancements in machine translation led Pangeanic to become the first language service provider to successfully apply Moses as recorded by the EU research program Euromatrixplus. The release of its ground-breaking DIY features in 2011 were “ahead of the times” and now part of many MT offerings. Manuel will focus his talk on how machine translation is not only being adopted by technical departments, but also by translators themselves, who see the opportunity of customizing engines as another helping tool in their jobs. This is an abstract of the presentation:

Gone to the days when machine translation meant a desktop application, an off-the-shelf product or a post-editing job imposed by a translation agency.
Or aren’t they? If you work for/own/outsource translation work from an LSP, machine translation may still sound like something exotic, something only “the big boys” do, something terribly complicated based on unintelligible algorithms. 
Fear not. Since 2011, small agencies, translation departments and even freelance translators can create their own engines at will. Increasingly, the translator with a taylor’s complex is giving way to a data handler who can mix&match pre-built engines, add new material at will and service pre-translations for fast post-editing. We do not need to wait for technology to evolve from electrons to photons for fast MT customization. It is already here. Some people used to be translators, now they have MT Customization Specialist on the name cards
Next time you think languages, think Pangeanic Your Machine Translation Customization Solutions
  

Multilingual web is more than translation (1/2)

by Manuel Herranz

It is beyond doubt that the web has become a multilingual. The work, experiences and cross-pollination with other disciplines, from machine translation to localization and semantics, were shared at EU-sponsored Multilingual Web event which took place in Rome during 12-13th March 2013.

Whilst technologies such as machine translation are already well-integrated for fast web page translation, it was reassuring to see that even large web actors, such as Google consider there is plenty of work to do in making the web truly multilingual. The release of ITS 2 and the new features and possibilities that html5 opens made the venue a meeting point for professionals, practitioners and academics dealing with the semantic web, translation, applied machine translation and CMS tool providers.

Google’s experiences were shared by Mark Davis and Vladimir Weinstein and pinpointed translation and localization issues which are often overseen. We already assume that a page can be easily translated for gisting, but smaller issues like plurals  & Gender (Alice added 1 people to his circle) remain unsolved even in the likes of Facebook.

Google got my language settings wrong

Everything’s better with a little sense of humour

Presenting people’ s names in a locale is not so easy as quoting them. Patterns are different and English, for example prefers to add nicks after the given name, where other cultures have a second name, the father’s and the mother’s name. This would be later dealt with by Richard Ishida in a full presentation.

It was encouraging to see that when localizing, Google faces the same hurdles as most translation companies,

  • Different messages to different translators
  • Most translators are not software engineers
  • Most engineers don’t speak 60 languages
  • May not know the gender

Making the web multilingual is not about translating (that may be fine for the content) but about presenting the web in a format and experience that may be user friendly in a culture. Google has gone a long way to present plurals as  numbers written as digits, cardinals (1, 2…) ordinals (1st, 2nd…), converting currencies, and even identifying which is the likely country when you type a phone number like (011) 34345345 (much easier when it is +54 9 98408374). This is used in technologies such as geolocation in Android and it also includes ways of resolving addresses, handling detailed validation for many regions and presenting a layout and basic validation for all regions.

Studying the language trends of Gmail users, Google knows that a fairly large number of Gmail users are multilingual. I would agree with some theories that state that about half of the world’s population knows at least another language, is familiar with it or is bilingual. But, embarrassingly, things can get complicated if one is signing up for a service (let’s say Google+) in 1 language from a location IP where a different language is used. This means that you may get mixed languages if you are a Spanish-speaking user signing up in Youtube whilst in Japan – and that’s personal experience…

Questions google cannot answer

Questions google cannot answer :)

Rendering names local
Richard Ishida gave a nice and funny presentation on the classification of names and how presenting and cataloging/field them varies greatly (and the issues this involves) if you come from India (where a “caste” tag would be necessary) to Spanish cultures where people carry both the father and the mother’s surname (although there are exceptions to the order they are presented), and don’t change their surnames when they get married – just as it happens in Chinese. This may look strange if you happen to come from a Northern-European culture.

Russian, where women also adopt the husband’s family names (thus presenting a “search” challenge on the web), things get slightly different as there are masculine and feminine inflection happens on your surname. So the surname of the wife is not exactly the same as her husband’s (who will often carry the name of his own father as a middle name (-ich) to state he is a son of [name of father]

Борис                        Николаевич             Ельцин
(Given name Boris)  (Father’s name, Nikolaivich masculine)  (Family name, masculine, Yeltsin)

Наина                         Иосифовна              Ельцина
(Given name Naina)  (Father’s name, Josefovna female)  (Family name, female, Yeltsina)

Arabic, a language where you can add being the “father of” later on in your life as part of your name, as well as your place of origin and your qualities. This obviously affects forms of address.
arabic name convention

One extreme case, but not so different from a classification perspective is Icelandic, where what we might take for surname is the father’s name plus a collection of family identifiers
bjork
Imagine, then, the challenge of finding then identifying and presenting people across different languages in an automated way…

The second part will wrap up the event with use cases, applied machine translation, CMS and Translation Management Systems.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Help translation project to protect global internet freedom

The following statement has been written by Ellery Biddle in the Advocacy section of Global Voices. We urge all our readers to share it and understand the serious issue behind Internet governance at stake. If Internet can be controlled in the way it is proposed and its openness constrained, our global rights as netcitizens will be compromised. Below follows Ellery’s call, as it appears in the blog.

“Over the next seven days, Global Voices Lingua volunteers will be translating a public online petition that supports the protection of human rights online and urges government members of the International Telecommunication Union (ITU) to preserve Internet openness at the upcoming conference of the ITU.

Open for sign-on by any individual or civil society organization, the Protect Global Internet Freedom statement reads as follows:

On December 3rd, the world’s governments will meet to update a key treaty of a UN agency called the International Telecommunication Union (ITU). Some governments are proposing to extend ITU authority to Internet governance in ways that could threaten Internet openness and innovation, increase access costs, and erode human rights online. We call on civil society organizations and citizens of all nations to sign the following Statement to Protect Global Internet Freedom:

Internet governance decisions should be made in a transparent manner with genuine multistakeholder participation from civil society, governments, and the private sector. We call on the ITU and its member states to embrace transparency and reject any proposals that might expand ITU authority to areas of Internet governance that threaten the exercise of human rights online.

To sign the petition, visit the Protect Global Internet Freedom website. To sign, enter your first name, last name, email address, organization name (if you are signing on behalf of a civil society organization), organization URL, and select your country.

All translations will also be posted on the petition site, which is hosted by OpenMedia, a Canada- based digital rights group.

As translations appear (see above), please feel encouraged to share links on social networks and with friends!”.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

I want you to speak English or get out

EU reduces translation budget – Machine Translation and Post-editing, one future

by Manuel Herranz

On 21st November 2012, lawmakers approved a report by Stanimir Ilchev, a Bulgarian Liberal MEP, that will bring change to the procedural rules recording plenary debates. This decision could be a Godsend for machine translation and language technology developers as the EU plans to increase translation productivity (or times) by 25% – this being a target in current R&D Language Technology Funding Calls.

Starting from the next plenary, on 10th December, the European Parliament is not going to be required to translate the session into all the 23 official languages of the EU. Over the years, this requirement has proved quite costly and can take up to four months. However, a bias towards the English language has been pointed to in many circles and instances. For example, Jean Quatremer, a renowned French political journalist from the French daily Libération, complained about the official press statements containing the Commission’s economic recommendations to member states, published on 30th May 2012.

These statements had been eagerly awaited by the press because of the euro debt crisis, but initially were only made available to journalists in English. The translations into other languages followed a few hours later that day. Mr. Quatremer said that initial monolingual release provided the Anglo-Saxon press with an “incredible competitive advantage” and it threw into doubt the institutions’ democratic legitimacy, making very clear his position on a very strong-worded blog entry From December 2012, the EU legislative will only record proceedings in the original language of the speaker. Nevertheless, the proceedings will still be required to be translated into a particular language if there is a request by a member state.  However, in the European Parliament many official press statements are currently published only in English and a very limited amount of them are translated in other languages – despite huge efforts and money invested into translation services and increasingly, in machine translation technology. “This is one of our struggles – that the press releases and all publications and communications with society (tenders, contracts, etc.) are translated,” said Miguel Angel Martinez Martinez, the Parliament’s Vice-President in charge of multilingualism.

Numbers speak for themselves: 72% of all EU documents are drafted in English, with French coming a far second with 12%. Only 3% are originally drafted in German. On the other hand, 88% of the users of the Commission’s Europa website speak English. In reality, “providing documents in English, French, German, Spanish and Italian would cover close to 100% of all the EU’s linguistic needs”, said the DG Translation Director-General Lönnroth, speaking at a debate hosted by the Centre for European Policy Studies on 22nd February. The Union “will just have to cope” with increasing linguistic pressures brought on by future enlargements because “no decision-maker would dare to touch the main principles” of the EU’s language policy. Mr Ilchev rejected proposals to translate the sessions only in English, as it would “appear linguistically unjust”. In the current EU, having 23 official languages means 506 translation and interpreting combinations, said Translation Director-General Lönnroth, a figure which can increase significantly when Croatia, Serbia join, and even Turkey in the foreseeable future. Acknowledging he is not a “language fanatic”, the director-general claimed he thinks “about how to reduce the workload every day” as it was “not in the taxpayer’s interest” to provide every language combination. Lönnroth said back in February that “it would be easier if everybody accepted that English and French were the main EU languages”. This is what (partially) is going to happen, although Mr. Ilchev assures that the initiative will not harm multilingualism, a principle enshrined in EU treaties: “of course this principle is not in question and everyone can listen to our debates in plenary in their own language” – through interpretation. Some of the EU’s research funding actually goes into technology solutions and research. For example, the SUMMAT project aims at creating an online service for subtitling by machine translation.

Next time you think languages, think Pangeanic Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

European Day of Languages- A Call from META-NET

by Manuel Herranz

In the wake of the European Day of Languages, the following press release was published by META-NET. This organization is te Network of Excellence dedicated to fostering the technological foundations of a multilingual European information society, of which Pangeanic is a member, linking up to other organizations with an interest in machine translation via its technology division PangeaMT.

The message of the call is important enough to be reproduced in its entire form, without editing, with the permission of the organization.

At Least 21 European Languages in Danger of Digital Extinction

Good News and Bad News on the European Day of Languages

Most European languages are unlikely to survive in the digital age, a new study by Europe’s leading Language Technology experts warns. Assessing the level of support through language technology for 30 of the approximately 80 European languages, the experts conclude that digital support for 21 of the 30 languages investigated is “non-existent” or “weak” at best. The study “Europe’s Languages in the Digital Age” was carried out by META-NET, a European network of excellence that consists of 60 research centres in 34 countries, working on the technological foundations of multilingual Europe.

Europe must take action to prepare its languages for the digital age. They are a precious component of our cultural heritage and, as such, they deserve future-proofing. The European Day of Languages on September 26 recognizes the importance of fostering and developing the rich linguistic and cultural heritage of our continent. The META-NET study shows that, in the digital age, multilingual Europe and its linguistic heritage are facing challenges but also many possibilities and opportunities.

Languages of Europe

Distribution of languages in the European continent

The study, prepared by more than 200 experts and documented in 30 volumes of the META-NET White Paper Series (available both online and in print), assessed language technology support for each language in four different areas: automatic translation, speech interaction, text analysis and the availability of language resources. A total of 21 of the 30 languages (70%) were placed in the lowest category, “support is weak or non-existent” for at least one area by the experts. Several languages, for example, Icelandic, Latvian, Lithuanian and Maltese, receive this lowest score in all four areas. On the other end of the spectrum, while no language was considered to have “excellent support”, only English was assessed as having “good support”, followed by languages such as Dutch, French, German, Italian and Spanish with “moderate support”. Languages such as Basque, Bulgarian, Catalan, Greek, Hungarian and Polish exhibit “fragmentary support”, placing them also in the set of high-risk languages.

“The results of our study are most alarming. The majority of European languages are severely under-resourced and some are almost completely neglected. In this sense, many of our languages are not yet future-proof.”, says Prof. Hans Uszkoreit, coordinator of META-NET, scientific director at DFKI (German Research Center for Artificial Intelligence) and, together with Dr. Georg Rehm (DFKI), co-editor of the study. Dr. Georg Rehm adds:  “There are dramatic differences in language technology support between the various European languages and technology areas.

Family of European Languages

Family of European Languages

The gap between ‘big’ and ‘small’ languages still keeps widening. We have to make sure that we equip all smaller and under-resourced languages with the needed base technologies, otherwise these languages are doomed to digital extinction.”

The field of language technology produces software that can process spoken or written human language. Well-known examples of language technology software include spell and grammar checkers, interactive personal assistants on smartphones (such as Siri on the iPhone), dialogue systems that work over the phone, automatic translation systems, web search engines, and synthetic voices used in car navigation systems. Today language technology systems primarily rely on statistical methods that require incredibly large amounts of written or spoken data. Especially for languages with relatively few speakers it is difficult to acquire the needed mass of data. Furthermore, statistical language technology systems have inherent limits in their quality, as can be seen, for example, in the often amusing incorrect translations produced by online machine translation systems.

Europe has succeeded in removing almost all borders between its countries. One border still exists, however, and it seems to be impenetrable: the invisible border of language barriers is one that hinders the free flow of knowledge and information. It also harms the long-term goal of establishing a single digital market because it hinders the free flow of goods, products, and services. While language technology has the potential to get rid of language barriers through modern machine translation systems, the results of the META-NET study clearly show that many European languages are not yet ready. There are significant gaps in technology due to the English-language focus of most R&D, a lack of commitment and financial resources, and also a lack of a clear research and technology vision.

A coordinated, large-scale effort has to be made in Europe to create the missing technologies as well as transfer technology to the majority of languages. There are strong reasons for approaching this immense challenge in a community effort involving the EU, its member states and associated countries, as well as industry: the high per-capita financial burden for smaller language communities; the needed transfer of technologies between languages; the lack of interoperability of resources, tools, and services; and the fact that linguistic borders often do not coincide with political borders.

Language Technology: Background

Language technology already supports us in everyday tasks, such as writing e-mails or buying tickets. We benefit from language technology when searching for and translating web pages, using a word processor’s spell and grammar checking features, operating our car’s entertainment system or our mobile phone with spoken commands, getting recommendations in an online store, or following the instructions spoken by a mobile navigation app.

In the near future, we will be able to talk to computer programs as well as machines and appliances, including the long-awaited service robots that will soon enter our homes and work places. Wherever we are, when we need information or help, we will simply ask for it. Removing the communication barrier between people and technology will change our world.

Language technology is generally acknowledged today as one of the key growth areas in information technology. Large international corporations such as Google, Microsoft, IBM, and Nuance have invested substantially in this area. In Europe, hundreds of small and medium enterprises have specialized in certain language technology applications or services. Language technology allows people to collaborate, learn, do business, and share knowledge across language borders and independently of their computer skills.

The META-NET White Paper Series

The META-NET White Paper series “Europe’s Languages in the Digital Age” reports on the state of 30 European languages with respect to Language Technology and explains the most urgent risks and chances. The series covers all official EU Member State languages and several other languages spoken in Europe. While there have been a number of valuable and comprehensive scientific studies on certain aspects of languages and technology, until now there has been no generally understandable compendium that presents the main findings and challenges for each language with regard to a technology-supported multilingual Europe. The META-NET White Paper Series fills this gap. META-NET can now show why most languages face serious problems and pinpoint the most threatening gaps. In total, more than 200 authors and contributors helped preparing the Language White Papers.

The white papers were written for the following European languages: Basque, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian (bokmål and nynorsk), Polish, Portuguese, Romanian, Serbian, Slovak, Slovene, Spanish, and Swedish. Each Language White Paper is written in the language it reports upon and includes a complete English translation.

About META-NET and META

META-NET, a Network of Excellence consisting of 60 research centres from 34 countries, is dedicated to building the technological foundations of a multilingual European information society. META-NET is co-funded by the European Commission through a total of four projects.

META-NET is forging META, the Multilingual Europe Technology Alliance. More than 600 organisations from 55 countries, including research centres, universities, small and medium companies as well as several big enterprises, have already joined this open technology alliance.

Background Information / Volumes / Press Releases / Quotes:

  • http://www.meta-net.eu
  • http://www.meta-net.eu/whitepapers
  • http://www.meta-net.eu/whitepapers/all-quotes-and-testimonials
  • http://www.meta-net.eu/whitepapers/press-release (including ca. 30 translations of this press release)
 
Contact:
Prof. Dr. Hans Uszkoreit
Dr. Georg Rehm
META-NET Office c/o DFKI GmbH
Alt-Moabit 91c
10559 Berlin, Germany
Phone:       +49 30 23895-1833
Email:       georg.rehm@dfki.de
Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

   

A message from Manuel Herranz from Northern Cameroon

by Manuel Herranz

Dear friends, colleagues, and acquaintances

Many of you already know that I am in northern Cameroon in a small town called Gouria for almost two months doing volunteer work. I had to travel 50 km on a motorcycle to write this email. It is the third day I try since the connection is so unreliable. It is very difficult to get into my gmail mail account because the computers are old and do not load modern pages well. In addition, another 20 miles today and over an hour just to identify and enter into GMail, not counting the time of writing the message. Internet speed is no more than 46kb, what we had almost 20 years ago. I write this message in WordPad because even once inside, the connection may fail at any time. I’ll paste it in the body of the message when finished.

You all know me for the development in machine translation and the applications of machine translation to LSPs. Anyone who thinks I left work, family to go on a safari is wrong, This is a humbling experience for anyone in the first world even in times of crisis. My mission here was to rebuild a library for the school that with much effort Judith one of my wife’s colleagues at her school, has built. Admirably, almost single handedly. You can find all information at www.malimaproject.org

Also, I came with two laptops and the thought of setting everything in motion, including computer training so that children can touch a keyboard for the first time, enter their name and that of their family and any sentence that comes to mind. Education is one of the basic rights of humanity. But soon I realized that I cannot stand idly by once I have seen firsthand how half the world population lives [again] and particularly the urgent needs as we have here. I’m here and I can help with many more issues until April 10th. I’m talking about small very basic knowledge of medicine and engineering.

I’m not going to ask for much. In the town where I live there is practically no electricity at any house, which are mostly built with mud bricks. The Malima school is a fortunate island. There is also no running water. Water is a huge problem that can be solved fairly easily because it rains a lot in August and September, but they have no means to collect, store or sanitize the water, which would be good and keep them alive throughout the year. They make do with forage, underground wells. I want to ask you to take advantage of my stay here to send some very simple things, but which can make the difference between a painful infection and permanent blindness or save someone’s eyes.

Even toothbrushes and toothpaste will be a good advancement in dental care and education. I am here until 10th April, so there is time. Solutions for stomach pain like bicarbonate soda or just chamomile tea. Something to clean wounds, nothing more. Things that are easily available in our world like water peroxide that you can get easily in a supermarket to kill the initial bacteria if you have a minor cut or bleeding brush, gauze, alcohol, bandages, eye drops, aspirin, a few drops for otitis or to wash eye infection, but simply toothbrushes will help. Even with just bicarbonate I can work wonders and fix digestion and take care of small infections. I can make and show them very basic toothpaste with bicarbonate, herb paste and a little bit of alcohol.

They live with nothing but what we throw away in developed countries to recycle containers or worse still, with what more fortunate countries of Africa itself throw away. My mirror to have a shave is the side mirror of a car that has been stripped off. I understand the terrible crisis we all live in our side of the world, but believe me, it is a rich man’s small cold when confronted how more than 50% of the population live in the real world. If you want to send any of the above whilst I am here, the address in Cameroun is

KORIHE VANDI
Manuel Herranz / Cooperation Malima
S C Malima Primary School
Boite Postal 15 MOGODE
CAMEROUN

Otherwise, if posting is too troublesome, you can make a small 5 euro donation to La Caixa with the message # to manuel herranz, gouria # and the association will get supplies and educational material over here during the next trip. Even petty change that you have left over from any trip to Europe will be fine. You can find the bank account details on the website  and the organization, small as it is and all volunteers can be contacted in supportgroup@malimaproject.org,

All funds can be gathered to sponsor a child for a couple of years and provide an opportunity for a brighter future. Even run the next round of vaccinations, dental education, etc. If you cannot send a bank transfer, you can send cash safely in an envelope (wrap it up inside another smaller envelope) FAO
Judith Burnett / Deborah Carr
Cambridge Community College
Calle Profesorado Espanol
Rocafort, Valencia
Spain

Luckily, the Post Office, the Health Care system and the national education system are some of the things that work well in Spain despite the government cuts. I’m not asking for much, is what we pay for a pint of beer, is what we pay for a snack at the bar and a drink. If you prefer to donate to the bank account, you will find on the website of www.projectmalima.org but receiving some items by post like toothbrushes or dental paste, aspirins will make the whole town infinitely happy. Forasking sake, I would ask for disinfectants or antibacterials or antimicrobials with a brief explanation in French if possible, if not in English.

Fixing the water problem is no big deal, I was in engineering long enough to realize that it would suffice with drilling a water. One already exists, but the water is not potable. Other funding can go to buying books for the library, purchasing textiles so the children can go in a sort of uniform to school and not rags.

Having a school creates a local economy also as women can get involved in sowing uniforms and have their first salary. It all can be done locally and the local community can get involved. This is no present from the First World. It would not be difficult in the medium term, but for now it will suffice to take stock of basic things such as bicarbonate, water peroxide, cotton, gauze, Mercurochrome, ibuprofen, aspirin and band-aids to cover the basics or to save a small amount for a second well where people can go and collect water. Please, pass this message to as many people as possible. I can do lots whilst here in the next 4 weeks, but when I am gone, the locals will not have a clue how to cure a rash, what are the tablets for or how to apply a first aid kit. Just like two PCs have been stored in their boxes as nobody knew how to put them in motion. Old copies of software in French like text editors or spreadsheet programs like excel or similar, for old fashioned PCs, or OpenOffice for PCs running on 1 Gb RAM and less will also be appreciated, like any educational software in French. You can post them to the address in Mogode. Unfortunately, the website of the association has no paypal account yet and it is just a small association in Valencia. A local web company and me will work on the website so even very small donations are possible when I return home.

Just think that what we spend of a coffee or tea a day can make such a huge difference as having a child in school and not working the land from 5 years of age. The conditions of the government schools are very hard. That will require collection and funding, which I am prepared to do on a volunteer basis and engage Pangeanic as much as possible. But that will be upon return. What I spend some days in mineral water 1000 or 1500 CFA (1 euro is equivalent to 650 Cameroon francs, about 2.5 euros in total) is what an entire family of 6 people has to eat. It makes me feel sick with myself.

There are orphans everywhere, which are then readopted by grandparents or uncles, according to things. In addition, a relative can send you a child who will grow up with you if you have good heart and things are going well. There is a boy of about 10 years with the family I am staying, almost the same age as my son, he is sponsored by a Spanish family. He is no relative of them, but they have found a place in the haystack, among sacks of corn and rice and is more than happy. The opportunity to go to Malima School for them is like going to a good private school in any European city where you can learn English in addition to the French that everyone is educated in and they learn through life.

The family with whom I stay in Gouria, is about 3 km from Malima school, manages the organization of the school. Vandi, the father, is extremely helpful. I have dinner with them and share all they do. I told them I do not want them to do anything special for me, but I cannot imagine what luxuries could they have. A sachet of tomato concentrate is a luxurious article. Obviously, being fat is a sign of wealth. When I arrived neighbors began to come out, people from other houses. All authorities in the region know I’m here. It is absolutely safe and you are looked after. As Muslims mix with Catholics, Protestants of all branches and some animists, I had to notify each particular authority of my presence, from the gendarmerie to Muslim subprophet and an Italian priest to spread the word that there is an arassa man (pale skin) for 2 months and he is coming to do good to the community and the school. My mission is to build a library, plaster it, buy school books and with whatever is left, In time on me. I will also computerize the school but at a very basic level, so that children can touch a keyboard and write their name and a couple of sentences on the screen. It will also help to keep a register of children and communication with children’s sponsors and administration personnel, all volunteers. The teachers will be able to type and print exams for ages 6 to 12.

Sponsoring a child is around 10 or 15 euros per month, what for us is a daily menu and a drink. That guarantees school fees, teachers’ salaries, the maintenance of the buildings and some other projects like the non potable water well as you can see in the attached jpg.

I will take 3 buses, an overnight stay at Garoua and another overnight train to Yaounde and another bus to Douala to return to Valencia via Paris. It takes between 3 or 4 days to get here, near the border with Nigeria. So if you just post anything this week or the next, you will have not only my eternal gratitude, but you will have helped a community of about 10.000 people have their first dental treatment, headache relieve, and help set up an ongoing First Aid Post in forgotten land.

Best to all and remember nobody died thinking “I wished I stayed a little bit longer in the office” but many die thinking “I wish I could have done better”.
Manuel Herranz