Machine Translation in Short

It is evident that certain documents require a human translator in order to interpret the subtleties of a language. Nevertheless, no matter how skilled a human translator may be, machine translation (also known as automatic translation or MT for short) exceeds the efficiency of a human translator.

Machine translation is generally used for subject-specific cases and this is where results and productivity rates are spectacularly higher. It allows individuals and companies to tailor their work according to the topic. Consequently, this enriches the output and quality of machine translation by cutting down on the number of choices for each word(s) to be translated.

This form of translation is extremely helpful in areas where formal language is used or phrases are repeated without much variation, such as administrative documents, which do not require the use of colloquial language and expression.

The potential of machine translation has been increasingly explored. In 2009, even President Obama mentioned that “highly precise automatic translation…could reduce the barriers faced in international commerce and collaboration.

Companies such as Microsoft are pushing this field to its forefront to create the most efficient forms of translation. Simultaneous-translation devices are being explored worldwide, ranging form London to Japan, where large mobile-phone companies like NTT DoCoMo, have introduced an apparatus that translates phone calls between English and Japanese, or Chinese and Korean. More about this form of technology can be read in a recent article in The Economist.

Although simultaneous-translation seems to be at the height of the translating industry’s innovation, machine translation remains an extremely sought after technology; Microsoft’s Translator API (application programming interface) alone attracts over 10,000 commercial users. Its increasing investment in this field may have to do with the accumulation of information on the Internet and the value of social media- for example Amazon, Facebook, and Twitter have integrated Microsoft’s Translator Hub into their websites.

Our machine translation division PangeaMT has been a leader in developing, fast-training and self-updating (DIY SMT) routines since 2011. This allows users to create small engines with their own material (TMX bilingual files) whilst profiting from the language coverage offered by larger engines – with a very rich set of quality features and functionalities.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Multilingual web is more than translation (1/2)

by Manuel Herranz

It is beyond doubt that the web has become a multilingual. The work, experiences and cross-pollination with other disciplines, from machine translation to localization and semantics, were shared at EU-sponsored Multilingual Web event which took place in Rome during 12-13th March 2013.

Whilst technologies such as machine translation are already well-integrated for fast web page translation, it was reassuring to see that even large web actors, such as Google consider there is plenty of work to do in making the web truly multilingual. The release of ITS 2 and the new features and possibilities that html5 opens made the venue a meeting point for professionals, practitioners and academics dealing with the semantic web, translation, applied machine translation and CMS tool providers.

Google’s experiences were shared by Mark Davis and Vladimir Weinstein and pinpointed translation and localization issues which are often overseen. We already assume that a page can be easily translated for gisting, but smaller issues like plurals  & Gender (Alice added 1 people to his circle) remain unsolved even in the likes of Facebook.

Google got my language settings wrong

Everything’s better with a little sense of humour

Presenting people’ s names in a locale is not so easy as quoting them. Patterns are different and English, for example prefers to add nicks after the given name, where other cultures have a second name, the father’s and the mother’s name. This would be later dealt with by Richard Ishida in a full presentation.

It was encouraging to see that when localizing, Google faces the same hurdles as most translation companies,

  • Different messages to different translators
  • Most translators are not software engineers
  • Most engineers don’t speak 60 languages
  • May not know the gender

Making the web multilingual is not about translating (that may be fine for the content) but about presenting the web in a format and experience that may be user friendly in a culture. Google has gone a long way to present plurals as  numbers written as digits, cardinals (1, 2…) ordinals (1st, 2nd…), converting currencies, and even identifying which is the likely country when you type a phone number like (011) 34345345 (much easier when it is +54 9 98408374). This is used in technologies such as geolocation in Android and it also includes ways of resolving addresses, handling detailed validation for many regions and presenting a layout and basic validation for all regions.

Studying the language trends of Gmail users, Google knows that a fairly large number of Gmail users are multilingual. I would agree with some theories that state that about half of the world’s population knows at least another language, is familiar with it or is bilingual. But, embarrassingly, things can get complicated if one is signing up for a service (let’s say Google+) in 1 language from a location IP where a different language is used. This means that you may get mixed languages if you are a Spanish-speaking user signing up in Youtube whilst in Japan – and that’s personal experience…

Questions google cannot answer

Questions google cannot answer :)

Rendering names local
Richard Ishida gave a nice and funny presentation on the classification of names and how presenting and cataloging/field them varies greatly (and the issues this involves) if you come from India (where a “caste” tag would be necessary) to Spanish cultures where people carry both the father and the mother’s surname (although there are exceptions to the order they are presented), and don’t change their surnames when they get married – just as it happens in Chinese. This may look strange if you happen to come from a Northern-European culture.

Russian, where women also adopt the husband’s family names (thus presenting a “search” challenge on the web), things get slightly different as there are masculine and feminine inflection happens on your surname. So the surname of the wife is not exactly the same as her husband’s (who will often carry the name of his own father as a middle name (-ich) to state he is a son of [name of father]

Борис                        Николаевич             Ельцин
(Given name Boris)  (Father’s name, Nikolaivich masculine)  (Family name, masculine, Yeltsin)

Наина                         Иосифовна              Ельцина
(Given name Naina)  (Father’s name, Josefovna female)  (Family name, female, Yeltsina)

Arabic, a language where you can add being the “father of” later on in your life as part of your name, as well as your place of origin and your qualities. This obviously affects forms of address.
arabic name convention

One extreme case, but not so different from a classification perspective is Icelandic, where what we might take for surname is the father’s name plus a collection of family identifiers
bjork
Imagine, then, the challenge of finding then identifying and presenting people across different languages in an automated way…

The second part will wrap up the event with use cases, applied machine translation, CMS and Translation Management Systems.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Machine Translation Helps Patent Offices Worldwide

by Manuel Herranz

A unified patent court will come into effect at the beginning of 2014 in the wake of a European Parliament decision which will break a patent translation deadlock. This agreement still depends on the system being ratified by thirteen European Union states including Germany, France and the UK.

The European Patent Office (EPO) also officially welcomed the adoption of this resolution by the European Parliament in Strasbourg of two draft regulations on the creation of the unitary patent, hailing it as a historic achievement. “The European Union is to be congratulated on this decision, which clears the way for the completion of the European patent system with a unitary patent and a Unified Patent Court, which we have been waiting for in Europe for 40 years,” said EPO President Benoît Battistelli. Members of the European Parliament voted in favour of a new single European patent system which is expected to bring benefits to high-tech SMEs as Europe is finally set to get a new unified intellectual property system able to compete with the US and Japan.

It was only a week earlier that EPO and the Chinese Patent Office launched a much awaited Chinese-English machine translation for patents.  The service includes the collections of the two largest languages in patents which are now united as full-text documents on the same website through the EPO’s global patent database, Espacenet, and linguistically accessible for innovators from both regions using a single tool, Patent Translate. Thanks to the new arrangement with SIPO, Espacenet has grown by 4 million Chinese language documents, adding to the 75 million documents already available.

Language policies have remained a huge stumble block in EPOs policy but the advent and advancement of machine translation technologies, particularly statistical machine translation, have provided renewed impetus and speed to Recent sticking points over the proposed unitary system had centered on language translations, leading Italy and Spain to reject proposals. Translations represent a huge chunk of the overall cost of obtaining an EU-wide patent currently.
Bernhard Rapkay, lead MEP on the attempt to set up a unitary patent protection system, admitted that while the path towards its introduction had been “long and troubled” – largely because of language issues – the arduous process would ultimately prove to be worth the effort.

“Today’s vote is good news for the EU economy and especially for small and medium-sized enterprises (SMEs),” he said.

“Today a European patent issued by the European Patent Office (EPO) providing protection in the 27 EU Member States can cost up to €36,000, including up to €23,000 in translation fees alone,” states the European Commission, which believes that a new unitary patent will cost a maximum of only €6,425, with the costs of translation set to range from €680 to €2,380.

Italian MEP Raffaele Baldassarre, who had been leading negotiations on the new translation regime, described the current cost burden as “effectively a tax on innovation”.

He pointed out that, even in Italy, 75% of companies register their patents in English. “The world of business has moved on, and we have to accept this,” Baldassarre said in a press conference following the MEP vote. “We have to understand that there is a need out there to facilitate innovation in the EU.”

Language breakthrough
Under the new scheme, patent applications will have to be made in either English, French or German – and will be made available in the same three languages. “If made in another language, they will have to be accompanied by a translation into one of these three languages,” declared the Parliament.

While that could be seen as unfair towards those outside the UK, France and Germany, MEPs also voted to fully reimburse translation costs for EU-based SMEs, non-profit organizations, universities and public research organizations – something expected to benefit small high-tech firms in particular.

Renewal fees, which are responsible for much of the overall cost of the current European system, will also be set at a level to help SMEs, the Parliament added.

Spain and Italy, the countries most affected by the new language rules, remain outside the new regime for now, but that will not stop the decision coming into effect – provided that 13 EU states, including the UK, France and Germany, ratify the plan.

That’s because the legal package is proceeding via the so-called “enhanced cooperation procedure”, which allows groups of EU states to move ahead together without the agreement of all members.

“Spain and Italy have so far opted out of the unitary patent package, but could join in the decision-making process at any time,” the Parliament says. “This procedure was used to break a deadlock, mainly due to language issues, that lasted over thirty years.”

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Pangeanic Christmas Party… All for Translation Automation!!

Let’s change our machine translation and translation automation focus for once and share the happiness of Christmas period with everyone. All Pangeanic staff work very hard in all types of translation projects and translation consultancy so… it was time to celebrate!

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Help translation project to protect global internet freedom

The following statement has been written by Ellery Biddle in the Advocacy section of Global Voices. We urge all our readers to share it and understand the serious issue behind Internet governance at stake. If Internet can be controlled in the way it is proposed and its openness constrained, our global rights as netcitizens will be compromised. Below follows Ellery’s call, as it appears in the blog.

“Over the next seven days, Global Voices Lingua volunteers will be translating a public online petition that supports the protection of human rights online and urges government members of the International Telecommunication Union (ITU) to preserve Internet openness at the upcoming conference of the ITU.

Open for sign-on by any individual or civil society organization, the Protect Global Internet Freedom statement reads as follows:

On December 3rd, the world’s governments will meet to update a key treaty of a UN agency called the International Telecommunication Union (ITU). Some governments are proposing to extend ITU authority to Internet governance in ways that could threaten Internet openness and innovation, increase access costs, and erode human rights online. We call on civil society organizations and citizens of all nations to sign the following Statement to Protect Global Internet Freedom:

Internet governance decisions should be made in a transparent manner with genuine multistakeholder participation from civil society, governments, and the private sector. We call on the ITU and its member states to embrace transparency and reject any proposals that might expand ITU authority to areas of Internet governance that threaten the exercise of human rights online.

To sign the petition, visit the Protect Global Internet Freedom website. To sign, enter your first name, last name, email address, organization name (if you are signing on behalf of a civil society organization), organization URL, and select your country.

All translations will also be posted on the petition site, which is hosted by OpenMedia, a Canada- based digital rights group.

As translations appear (see above), please feel encouraged to share links on social networks and with friends!”.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

EU reduces translation budget – Machine Translation and Post-editing, one future

by Manuel Herranz

On 21st November 2012, lawmakers approved a report by Stanimir Ilchev, a Bulgarian Liberal MEP, that will bring change to the procedural rules recording plenary debates. This decision could be a Godsend for machine translation and language technology developers as the EU plans to increase translation productivity (or times) by 25% – this being a target in current R&D Language Technology Funding Calls.

Starting from the next plenary, on 10th December, the European Parliament is not going to be required to translate the session into all the 23 official languages of the EU. Over the years, this requirement has proved quite costly and can take up to four months. However, a bias towards the English language has been pointed to in many circles and instances. For example, Jean Quatremer, a renowned French political journalist from the French daily Libération, complained about the official press statements containing the Commission’s economic recommendations to member states, published on 30th May 2012. These statements had been eagerly awaited by the press because of the euro debt crisis, but initially were only made available to journalists in English. The translations into other languages followed a few hours later that day. Mr. Quatremer said that initial monolingual release provided the Anglo-Saxon press with an “incredible competitive advantage” and it threw into doubt the institutions’ democratic legitimacy, making very clear his position on a very strong-worded blog entry

From December 2012, the EU legislative will only record proceedings in the original language of the speaker. Nevertheless, the proceedings will still be required to be translated into a particular language if there is a request by a member state.  However, in the European Parliament many official press statements are currently published only in English and a very limited amount of them are translated in other languages – despite huge efforts and money invested into translation services and increasingly, in machine translation technology.

“This is one of our struggles – that the press releases and all publications and communications with society (tenders, contracts, etc.) are translated,” said Miguel Angel Martinez Martinez, the Parliament’s Vice-President in charge of multilingualism.

Numbers speak for themselves: 72% of all EU documents are drafted in English, with French coming a far second with 12%. Only 3% are originally drafted in German. On the other hand, 88% of the users of the Commission’s Europa website speak English. In reality, “providing documents in English, French, German, Spanish and Italian would cover close to 100% of all the EU’s linguistic needs”, said the DG Translation Director-General Lönnroth, speaking at a debate hosted by the Centre for European Policy Studies on 22nd February. The Union “will just have to cope” with increasing linguistic pressures brought on by future enlargements because “no decision-maker would dare to touch the main principles” of the EU’s language policy.

Mr Ilchev rejected proposals to translate the sessions only in English, as it would “appear linguistically unjust”. In the current EU, having 23 official languages means 506 translation and interpreting combinations, said Translation Director-General Lönnroth, a figure which can increase significantly when Croatia, Serbia join, and even Turkey in the foreseeable future.
Acknowledging he is not a “language fanatic”, the director-general claimed he thinks “about how to reduce the workload every day” as it was “not in the taxpayer’s interest” to provide every language combination. Lönnroth said back in February that “it would be easier if everybody accepted that English and French were the main EU languages”.  This is what (partially) is going to happen, although Mr. Ilchev assures that the initiative will not harm multilingualism, a principle enshrined in EU treaties: “of course this principle is not in question and everyone can listen to our debates in plenary in their own language” – through interpretation. Some of the EU’s research funding actually goes into technology solutions and research. For example, the SUMMAT project aims at creating an online service for subtitling by machine translation.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

End of Multimodal Pattern Recognition Project Hails New Features of Machine Translation

by Manuel Herranz

The closure of the MIPRCV project (Multimodal Interactive Pattern Recognition) at the beginning of November showcased real-life industry applications from Spanish Research & Development, with examples from bank La Caixa, Pangeanic for language and machine translation, Telefonica for image retrieval, etc.

All systems relied on the concept of using existing information (be it bank information like receipts, invoices, and orders, translated bilingual files, classification of web images with and without text, etc)  and processing it on the dual concept of off-line training to produce good enough models and on user interaction that generates a system feedback whence the system learns and improves automatically. This is the also the basis, for example, of Pangeanic’s DIY SMT system applied to machine translation.

Research and industry presentations ranged from prototypes to applied technology already in use in industry. One star application is the semantic information (and syntactic sometimes in the case of language) which provides very powerful cataloging and search capabilities to organizations such as banks, whilst similar techniques improve machine translation applications. Cataloging documents using semantic techniques saved bank La Caixa up to 15 minutes per transaction, thus improving productivity per bank branch and employee by reducing and automating tax, money and salary transfers, etc.

In translation, increases in productivity were measured  by production per person in number of words and hours, with Pangeanic reporting successful cases of machine translation and post-editing well over 20,000 words/ day in controlled domains.

Other applications of multimodal pattern recognition dealt with online videos automatically classified by an interactive system using tag annotation and other techniques, image retrieval from the Internet on a given subject, cooperative detection/reaction of human actions (automated security disruption detection), application of pattern recognition to handwritten texts in order to digitalize old documents, facial recognition bu robots and ubiquitous robotics (human-robot interaction), advanced driving, improving hands and finger recognition, displacement of cubical data using gestures, control screens, etc.

Pangeanic is committed to continue to expand its R&D capabilities in collaboration with large-scale scientific programs in order to include the latest from the state-of-the-art into its PangeaMT technologies.

A few examples on how multimodal systems can be used for data processing, image retrieval and  processing can be viewed in Pangeanic’s Youtube channel, including moving robots!!

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Language Technology Industry Forum will incorporate

by Manuel Herranz

Representatives from Europe’s leading language technology companies met in Brussels on 8th-9th October and agreed to form a legal entity representing their interests. The main outcome of the Workshop, attended by over 50 representatives, was the decision to set up LT-Innovate as a legal entity to serve as representative body for LT vendors at European level. This decision comes in the wake of the very successful LT-Innovate Summit that brought together more than 160 LT stakeholders in June 2012.

Strengthening and raising the profile, as well as serving as a lobby to speak as one voice with industry and government is seen as increasingly necessary by many language technology companies, many of them SMEs.  In a world and EU brought increasingly together by economic forces and Internet,  the language technology industry must be an enabler and not a bottleneck in communication, thus opening new markets and business areas. Even within the EU itself, the situation is far from ideal. For example, Nikiforos Diamandouros, the European ombudsman, has criticized the European Commission for refusing to conduct public consultations in all the 23 official languages of the European Union.  In a decision timely published two days after the event (11th October), Mr. Diamandouros found that the Commission’s practice of launching public consultations in only a few EU languages constituted maladministration.

Day 1 concentrated mainly in the provision of information about the new funding opportunities by the EU on Language Technologies and the direction future efforts will take. Several technology developers, like Pangeanic, proposed their ideas in order to obtain feedback from peers and EU representatives.

Pangeanic at LT Days

Pangeanic at LT Days

Day 2 concentrated on mission statement and organizing committees and focus by the key players on the European Language Industry, from machine translation to speech recognition and other areas, focusing on an SME perspective. Joachin Hummel stated that “Only by organizing the industry, and presenting the voice of 100 companies or so to decision makers, we can put enough pressure to make our voices heard to those who have to take a decision on funding.”

Rubén Riestra intervened later, presenting on “Developing a Vision Statement for the Industry”. Sadly, he said, it is easier to move an apple through our borders than content through our digital borders. The EU does not have the military muscle to pull from the economy like the US has done nor the massive labor force resources like China. “We are flexible and we have a vision.” His vision is that “the European LT industry will build a language-neutral Digital Single Market before the end of the decade. If we do not address this, other players will come and take the market.”

“Funding agencies will fund results and not efforts. We need to build an infrastructure that others will take advantage of (like the roads that transport apples and cable companies and other industries use).” He continued by saying that “Europe needs to build an industry, in this case, the Language Industry, establishing legal frameworks and policies” by

  • lowering barriers
  • creating economies of scale
  • fostering skills

Technology is an enabler, not a solution in itself. Many LT companies find it difficult to find marketers and sales personnel to sell their technology easily as a plug-in, an enabler to other industries. LT needs to work together with other industries.

Rose Lockwood intervened later, presenting on “From niche to pervasive.”

Rose began by saying that new ICT ecosystem is affecting Europe’s ability to compete. Before, there was a hardware and software market, a telecoms market, etc, but the combination of platforms and data applications and the barriers are increasingly fuzzy.

Rose Lockwood presenting at LT Day in Brussels

Rose Lockwood presenting at LT Day in Brussels

Now we have a first layer composed by hardware companies like Cisco Samsung, Alcatel, Ericson, Nokia, a second layer with network operatos like BT, Deutsche Telecom, Telefónica, Vdafone, but it is the layer 3 we need to observe clearly as the include the big data players and these range from BBC to Google, Microsofot, etc and all massive digital publishers, of course now including video (Youtube, etc). That’s where EU fails. However, ICT firms in EU are not less R&D intensive (typically more) but they are

  • not specializing in new ICT sector, innovating in the less dynamic parts of the ecosystem and missing out completely on new segment of the market
  • concentrating investment in Layer 2 (least R&D intensive in the ecosystem)
  • not generating new firms that become leading innovators

Thus, Europe has a problem  with capturing value in Layer 3 markets (Breugel) because

a)      Digital market is not integrated
b)      Language barriers
c)       Absence of ICT clusters that reduces synergies.

HOWEVER: Europe’s cultural differences could also be an opportunity to differentiate and create niches, conditional on being able to each critical scale.

LT is a new ICT market that can unlock potential for Europe because it is founded on decades of European R&D with world-class levels of technological expertise. It also addresses linguistic barriers directly, with multilingual and translation capabilities. Furthermore, it enables advanced communication with intelligent natural interfaces. It is the only technology capable of handling the massive amounts of data we are facing today.

The size of our industry is approximately 20B € worldwide growing at 11% a year. There are 3 segments in the industry: intelligent content, translation and speech technologies. The first is overwhelmingly US-based, Europe takes the lead in the second, whereas speech technology is quite level even with the rest of the world. The world is moving from a fee-based, charging per licenses, to a model where enterprises monetize through usage fees. So, where can innovation happen in Layer 3? In Data, Application and Platforms and across many industries, for example, LifeScience /Healthcare will need Clinical Data with an application in the Public Health Monitoring, needs a platform to do so.

The innovation is going to come from platforms oriented to a significantly large enough domain and the industry has enough room to succeed, as long as it is multidisciplinary. For example, in the shape of a High Quality Machine Translation (HQMT) platform, a self—service voice cloning platform, etc. It is going to come from applications based on integrated LT features which are likely to be vertical industry or domain specific, although there are also opportunities for existing applications replicated for new markets (speech components for new languages, analytics services, etc).

Phillipe Wacker spoke later, calling for the generation of an industry vision. Today, niche technologies are in silos and our next step is to have a single industry with a sense of belonging together and a common vision/map for the way forward. Firstly, LT must be the cornerstone of the Digital Single Market: the European Language Cloud (infrastructure). Secondly, LT as a pervasive key enabling technology for vertical market segments in which Europe has global competitive advantage which can be consolidated and reinforced. In order to build the ecosystem, demand has to come from the buyers, who state their needs. It seems that nowadays, vendors have solutions that cannot market or for which there is a no market!! Researchers are the engines that facilitate the whole process. The mixture of the 3 creates innovation engines where we can identify and catalyze vertical innovation value-chains.

LT-Innovates will provide a forum for buyers and vendors of LT technologies to discuss their needs and thus guide future R&D. The agenda is to articulate/promote vis-à-vis other stakeholders (researchers, buyers, investors) and policy makers. Within the next few months, the organization will pursue several objectives identifying innovation opportunities for its members, initiating collaborative projects addressing opportunities, removing inhibitors and implementing solutions. Finally, it also envisages the organization of a Buyer Focus Group.

The decision was made to incorporate LT-Innovates as a legal entity, creating a Company Limited by Guarantee, based in the UK, which will have not-for-profit status. Phillipe’s full presentation has  been made available here.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

NTT DoCoMo prepares Japanese machine translation through Android

Japan is unique in many ways and this is reflected and expressed in its culture and its challenging language.  Japanese is controversially an Altaic language spoken by around 127 million people. Its intrinsic characteristics make it a challenge for machine translation and other forms of translation automation, although Pangeanic, in collaboration with Toshiba, has reported several advances in hybrid MT (as published in the Asian Association of Machine Translation in 2011) and presented in Japan Translation Festival (see presentation here).

Making calls to other countries a challenge for Japanese speakers: locals often don’t have much choice but to learn someone else’s language or hope there’s a Japanese speaker on the other end of the line.

All going well, NTT DoCoMo’s planned Hanashite Hon’yaku automatic translation service, international calls will be as comfortable as phoning a store in Nagano. As long as a subscriber has at least an Android 2.2 phone or tablet on the carrier’s moperaU or sp-mode plans, the service will automatically convert spoken Japanese to another language, and reverse the process for the reply, whether it’s through an outbound phone call or an in-person conversation.

The service is scheduled to operate from 1st November, when it will translate from Japanese to Chinese / English / Korean. Machine translation from Japanese into other European languages like French, German, Italian, Portuguese, Spanish plus two more Asian languages (Indonesian and Thai) will be added for this application in late November, raising the number of non-Japanese languages to 10, according to NTT Docomo’s press release.

If you are not so patient, NTT DoCoMo will provide a holdover on October 11th through Utsushite Hon’yaku, a free Word Lens-like augmented reality translator for Android 2.3 that can convert text to or from Japanese with a glance through a phone camera.

The app will be available free of charge. Users pay call and data charges for phone-to-phone conversations and translation data for screen text and voice readouts. Only data charges apply for face-to-face conversations,since no call is required. Subscription to DOCOMO’s “sp-mode” or “moperaU” connection service is required.

Utsushite Hon’yaku translates short written text between Japanese and either English, Chinese or Korean.

Translation is virtually instantaneous after the device’s camera captures the text. This commercial version of Menu Translator, which DOCOMO is trialing in Japan until October 31, will translate words and phrases not only in menus, but also street signs, signboards and more. Translation from Japanese also is possible, so DOCOMO expects the app to be quite useful for foreign people visiting Japan.

The Utsushite Hon’yaku app will be available free for download (data charges may apply). Usage will not incur any transmission fee since the translation process does not require network connection. It can be used on any smartphone or tablet equipped with an outer camera and running Android 2.3 or higher.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

European Day of Languages- A Call from META-NET

by Manuel Herranz

In the wake of the European Day of Languages, the following communication was published by META-NET. This organization is te Network of Excellence dedicated to fostering the technological foundations of a multilingual European information society, of which Pangeanic is a member, linking up to other organizations with an interest in machine translation via its technology division PangeaMT.

The message of the call is important enough to be reproduced in its entire form, without editing, with the permission of the organization.

At Least 21 European Languages in Danger of Digital Extinction

Good News and Bad News on the European Day of Languages

Most European languages are unlikely to survive in the digital age, a new study by Europe’s leading Language Technology experts warns. Assessing the level of support through language technology for 30 of the approximately 80 European languages, the experts conclude that digital support for 21 of the 30 languages investigated is “non-existent” or “weak” at best. The study “Europe’s Languages in the Digital Age” was carried out by META-NET, a European network of excellence that consists of 60 research centres in 34 countries, working on the technological foundations of multilingual Europe.

Europe must take action to prepare its languages for the digital age. They are a precious component of our cultural heritage and, as such, they deserve future-proofing. The European Day of Languages on September 26 recognises the importance of fostering and developing the rich linguistic and cultural heritage of our continent. The META-NET study shows that, in the digital age, multilingual Europe and its linguistic heritage are facing challenges but also many possibilities and opportunities.

Languages of Europe

Distribution of languages in the European continent

The study, prepared by more than 200 experts and documented in 30 volumes of the META-NET White Paper Series (available both online and in print), assessed language technology support for each language in four different areas: automatic translation, speech interaction, text analysis and the availability of language resources. A total of 21 of the 30 languages (70%) were placed in the lowest category, “support is weak or non-existent” for at least one area by the experts. Several languages, for example, Icelandic, Latvian, Lithuanian and Maltese, receive this lowest score in all four areas. On the other end of the spectrum, while no language was considered to have “excellent support”, only English was assessed as having “good support”, followed by languages such as Dutch, French, German, Italian and Spanish with “moderate support”. Languages such as Basque, Bulgarian, Catalan, Greek, Hungarian and Polish exhibit “fragmentary support”, placing them also in the set of high-risk languages.

“The results of our study are most alarming. The majority of European languages are severely under-resourced and some are almost completely neglected. In this sense, many of our languages are not yet future-proof.”, says Prof. Hans Uszkoreit, coordinator of META-NET, scientific director at DFKI (German Research Center for Artificial Intelligence) and, together with Dr. Georg Rehm (DFKI), co-editor of the study. Dr. Georg Rehm adds:  “There are dramatic differences in language technology support between the various European languages and technology areas.

Family of European Languages

Family of European Languages

The gap between ‘big’ and ‘small’ languages still keeps widening. We have to make sure that we equip all smaller and under-resourced languages with the needed base technologies, otherwise these languages are doomed to digital extinction.”

The field of language technology produces software that can process spoken or written human language. Well-known examples of language technology software include spell and grammar checkers, interactive personal assistants on smartphones (such as Siri on the iPhone), dialogue systems that work over the phone, automatic translation systems, web search engines, and synthetic voices used in car navigation systems. Today language technology systems primarily rely on statistical methods that require incredibly large amounts of written or spoken data. Especially for languages with relatively few speakers it is difficult to acquire the needed mass of data. Furthermore, statistical language technology systems have inherent limits in their quality, as can be seen, for example, in the often amusing incorrect translations produced by online machine translation systems.

Europe has succeeded in removing almost all borders between its countries. One border still exists, however, and it seems to be impenetrable: the invisible border of language barriers is one that hinders the free flow of knowledge and information. It also harms the long-term goal of establishing a single digital market because it hinders the free flow of goods, products, and services. While language technology has the potential to get rid of language barriers through modern machine translation systems, the results of the META-NET study clearly show that many European languages are not yet ready. There are significant gaps in technology due to the English-language focus of most R&D, a lack of commitment and financial resources, and also a lack of a clear research and technology vision.

A coordinated, large-scale effort has to be made in Europe to create the missing technologies as well as transfer technology to the majority of languages. There are strong reasons for approaching this immense challenge in a community effort involving the EU, its member states and associated countries, as well as industry: the high per-capita financial burden for smaller language communities; the needed transfer of technologies between languages; the lack of interoperability of resources, tools, and services; and the fact that linguistic borders often do not coincide with political borders.

Language Technology: Background

Language technology already supports us in everyday tasks, such as writing e-mails or buying tickets. We benefit from language technology when searching for and translating web pages, using a word processor’s spell and grammar checking features, operating our car’s entertainment system or our mobile phone with spoken commands, getting recommendations in an online store, or following the instructions spoken by a mobile navigation app.

In the near future, we will be able to talk to computer programs as well as machines and appliances, including the long-awaited service robots that will soon enter our homes and work places. Wherever we are, when we need information or help, we will simply ask for it. Removing the communication barrier between people and technology will change our world.

Language technology is generally acknowledged today as one of the key growth areas in information technology. Large international corporations such as Google, Microsoft, IBM, and Nuance have invested substantially in this area. In Europe, hundreds of small and medium enterprises have specialised in certain language technology applications or services. Language technology allows people to collaborate, learn, do business, and share knowledge across language borders and independently of their computer skills.

The META-NET White Paper Series

The META-NET White Paper series “Europe’s Languages in the Digital Age” reports on the state of 30 European languages with respect to Language Technology and explains the most urgent risks and chances. The series covers all official EU Member State languages and several other languages spoken in Europe. While there have been a number of valuable and comprehensive scientific studies on certain aspects of languages and technology, until now there has been no generally understandable compendium that presents the main findings and challenges for each language with regard to a technology-supported multilingual Europe. The META-NET White Paper Series fills this gap. META-NET can now show why most languages face serious problems and pinpoint the most threatening gaps. In total, more than 200 authors and contributors helped preparing the Language White Papers.

The white papers were written for the following European languages: Basque, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian (bokmål and nynorsk), Polish, Portuguese, Romanian, Serbian, Slovak, Slovene, Spanish, and Swedish. Each Language White Paper is written in the language it reports upon and includes a complete English translation.

About META-NET and META

META-NET, a Network of Excellence consisting of 60 research centres from 34 countries, is dedicated to building the technological foundations of a multilingual European information society. META-NET is co-funded by the European Commission through a total of four projects.

META-NET is forging META, the Multilingual Europe Technology Alliance. More than 600 organisations from 55 countries, including research centres, universities, small and medium companies as well as several big enterprises, have already joined this open technology alliance.

Background Information / Volumes / Press Releases / Quotes:

 
Contact:
Prof. Dr. Hans Uszkoreit
Dr. Georg Rehm
META-NET Office c/o DFKI GmbH
Alt-Moabit 91c
10559 Berlin, Germany
Phone:       +49 30 23895-1833
Email:       georg.rehm@dfki.de
Next time you think languages, think Pangeanic

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz