Tag Archives: MT

Pangeanic translation technology in the press

Pangeanic’s translation technology developments have often been the focus of international media and think tanks, like TAUS or Localization World, where we have showcased our technologies and Use Cases. Now, Pangeanic’s efficient translation workflows using Moses-based machine translation customized developments have also attracted local media attention.
The prestigious Spanish newspaper ElMundo.es printed a 3-page report with a full description of the company’s history, star use cases and applications, renown machine translation applications and developments.
pangeanic staff

Click here to obtain a free PDF of the article inno14oct.pdf.

Days ago, Valencia’s regional Finance Minister was interviewed by the online newspaper 20minutos. He quoted Pangeanic’s taking part in the Valencian Global program and machine translation technologies as key to create an “innovation ecosystem” that can create “highly qualified jobs” due to its significant technological component. Pangeanic expects to grow its services as a result of taking part in the program and expand its global sales network.

Pangeanic has also appeared in other digital media, such as notasdeprensa, again within the Valencian Global internationalization framework and entornointeligente, about its business development and coaching with leading entrepreneurship figures like MIT’s Bill Aulet.

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

         

Understanding Machine Translation Customization and DIY MT

by Manuel Herranz

The same mistake that was made by many translation agencies, translation companies and now language service providers is being made by tough machine translation companies. “My (machine) translation is better than yours”, “my machine translation system works, everybody else’s doesn’t”. Translation companies have learnt that they cannot sell translation services on “translation quality claims” only or “I am better than you because…”  – but it seems that some machine translation companies have to learn the same lesson. I am referring particularly to those with risky levels of investment /venture capital to repay and without the testing ground of in-house native speakers or a real translation department where to test their technologies and MT before release. At times, such companies obtained their “high quality clean data” by bombarding Google Translate and applying cleaning cycles which included manual revision by local, non-native graduates. Many LSPs fall for the big marketing campaigns, strong wordings – the limelight is always very attractive. Translation Memory technologies are a good proof of that.

Bad-mouthing the competition is the worst marketing tool I would recommend to anybody in sales, marketing or representing a company. Talk about your strengths. Acknowledge what you cannot do but what you can do to solve the problem. If you cannot match some offerings from the competition, saying it doesn’t work is a terrible policy. There are tens of use cases and applications, conferences, presentations to prove that, for example DIY MT works and is in good health, being used at LSPs, institutions and corporations. As far as I know, automated retraining and Moses packaging are part of at least two EU-funded programmes. As platforms such as Gala provide an excellent platform for machine translation webminars, monopolistic attitudes become more and more aggressive.

But I want to minimize self-promotion. What Kirti Vashee seems to forget in his virulent blog entries is that no company will release a tool that doesn’t work nor install a product that cannot do what it claims it can. I was an industrial engineer for many years to learn at least the difference between what works and doesn’t work. When it comes to hardware tools, quality may be easy to spot. When it comes to services (and in machine translation is clear, “my output” “my clients” “my productivity” and “my technological independence”) quality is what works best for me. Claiming that in 2013 MT is so complex only one company fully understands it, is presumptuous to say the least.

Let me quote some translation agencies (the term Language Service Provider being unknown to the majority of people outside the language industry). They are not big companies, possibly what economists call small and medium-size companies.

Tilde, Apsic, Lexcelera, Pangeanic. I am sure other four at least could make it to this list. What do these companies have in common? All of them were/are  translation companies that have transformed themselves into higher solution providers either by developing software solutions that solved particular problems in translation or by customizing technology into their processes. With the help of EU funds and a clear vision to fill a market need, Tilde led R&D projects aimed at developing machine translation for less-resourced languages. Automated engine creation and re-training were part of the initial EU-funded project.

Apsic is the developer of one of the best consistency-checking software (XBench) which is a must of any company wanting to ensure terminology consistency and error-free supplies over hundreds of files.

Pangeanic has developed a management system on top of Moses which manages training sets and automatically cleans some data, trains engines and creates new engines with a variety of other customizable features.

As MT customizers, we know that initially some settings, parameters, weighs and features need to be configured carefully to get a good start. But I do not know of any company in the software business that insists on manual processes and cannot automate what it has to do repetitively.

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

         

Pangea Machine Translation at Machine Translation Summit XIV

Machine Translation technologies from division Pangea and its involvement within the EXPERT program will be presented at the poster session of the European MT Summit 2013 which will take place in Nice, 2-6th September.

Pangeanic is attending as a member of the EU-funded Marie Curie consortium EXPERT (EXPloiting Empirical appRoaches to Translation). The aim of this EU-funded action is to train young researchers, namely Early Stage Researchers (ESRs) and Experienced Researchers (ERs), promoting research, development and the use of hybrid LT (language translation) technologies.

Like other partners in the consortium, Pangeanic is to contribute to different project stages. At the end of the project, the technological division PangeaMT will oversee the implementation and evaluation of the new hybrid computer-aided translation technology proposed in EXPERT. The company will also collaborate by (co-)hosting and supervising a number of post-graduate and post-doctoral researchers.

Some topics of interest for the translation industry are the use of language technology to improve matching & retrieval in translation memories which the University of Wolverhampton will undertake, as well as the investigation of methodologies to evaluate the improved SMT, EBMT and TM prototypes and new hybrid computer-aided translation technology.

This R&D project will last for four years (48 months). It involves the development of next generation translation-related technologies with the aim to address the needs of both translators and technical developments in their job specification and market as well as EU multilingual policy. Other partners, like the University of Amsterdam will exploit hierarchical alignments for

- linguistically-informed SMT models to meet the hybrid approaches that aim at compositional translation; and
- for a semantically-enriched SMT system that offers an extension to existing TMs to allow incremental, recursive partial match of the input using hierarchical constructions containing variables

The pan-European consortium is comprised of several leading universities and research groups in Europe. Pangeanic provides cover within a work package (implementing and testing hybrid machine translation technologies). It will also research translators’ requirements from translation technologies and work on confidence estimation of corpus-based approaches to translation.

Expert Consortium Partners

Expert Consortium Partners

The research project has received funding from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grant agreement n°[317471]
cropped-pangeamt_def_2010_sin_fondo.jpg

Cloud Traffic and Data Will Increase Translation Services

Cisco estimates that global cloud traffic will grow 45% annually until 2016, with translation services growing at around 15% to 20% per year. According to Ian  Henderson, CTO of Rubic, a translation and location company, this means that many new machine translators must enter the industry each year to handle the content.

On the other hand, Raymond Kurzweil, one of the brightest minds in the world, director of technology at Google and a futurist known for his predictions about artificial intelligence, predicts that machines will match human intelligence and perform several feats that seem to us science fiction nowadays, including human-quality translation, by year 2029.

Current happenings also suggest a strong role for non-human translation, with machine translation (MT) advancing rapidly. Three simultaneous-translation devices have been announced since June 2012, including one by Microsoft that renders live audio translations from the spoken word, respecting the tones and inflexions of the speaker.

Perfect is hard

But perfecting translation machine engines remains one of the toughest challenges in artificial intelligence. For several decades, computer scientists with the help of armies of linguists, tried rule-based approaches, i.e. teaching machine translation systems the linguistic rules or similarities between two languages (sometimes not related languages, like English and Japanese) and including the necessary dictionaries. Progress was extremely slow and suffered several setbacks, like the ALPAC report in 1966.

Technology did not cease to advance until statistical systems, using vasts amounts of data, have made it possible to train translation engines fast and efficiently for several domains. See our presentation in Budapest including a short history of machine translation.
[slideshare id=8510213&style=border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;&sc=no]

Click here for the longer version, a recommended review of a lucid article courtesy of Gadget Web Site.

Undoubtedly, ever growing content and the demand for translating online data into multiple languages is growing fast. Exponentially. Pangeanic launched its Pangea machine translation project in 2008, reporting real-life implementations in many events, and it is now a successful, customizable software capable of re-training itself and creating engines on the fly. The project has won international name and is part of EU-funded projects.

“Human translation and machine translation are kind of like ‘frenemies,’” translation expert Nataly Kelly said. “They live alongside each other, but not without a lot of tension.” Sometimes, machine translations are so atrocious, human translators prefer to start from scratch.

Machine Translation companies and their output are becoming more and more ubiquitious every day. And as experts, we know that the aim of the technology is not replacing multilingual humans. Machines (rather automatic translation software) cannot fully replace human translators…yet. In fact, human translators often clean up machine translation (post-editing). Thus, the technology becomes an enhancer rather than a replacement.

It is this need for accuracy that keeps the (human translation) business growing. In fact, it is one of the few industries to have grown during the worldwide recession. It is approximately a $34 billion market. Machine translation’s market is around $200 million with growth forecasts of around 18,65%.

“Demand for translation is booming because content creation is exploding,” says Kelly. “And since much of that content is created, and demanded, in multiple languages, human translators alone can’t keep up. They need machine translations to improve–and fast.”

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

         

I used to be a Translator, Now I Run Machine Translation (LocWorld London 2013)

by Manuel Herranz

It is only when looking back in time that one realizes how much work has been done, how far we are from where we used to be … what we call progress. That is what happened during our presentation at Localization World presentation of Pangea machine translation technologies.

Our presentation summarized how the mix of Pangea Technologies have enabled translation practitioners to empower themselves (see the launch of our DIY SMT in Barcelona, 2011) and be active in machine translation rather than just be passive users or passive post-editors. The platform is mostly based on open source developments to allow flexibility and customization, but it also includes  propietary cleaning filters, translation engine creation and retraining, dataset management and a very powerful set of statistics so users can see improvements every step of the way.

Pangea is the history of a solution designed for translators, for applied language professionals. It is machine translation as a productivity enhancer – and these features are what have made a small internal project grow into a reference technology, and a concept (DIY SMT) used worldwide.  Currently, the company is also part of the EU-funded EXPERT (Empirical Approaches to Hybrid MT and Post-Editing).

The story of Pangea DIY (S)MT will continue, applying its concept of flexibility and empowerment to language technologies, letting practitioners utilize their TMs as an engine-training tool, customizing their translation engines even with small data sets.

[slideshare id=24510059&style=border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px&sc=no]

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

   

Expert

Pangeanic in EU EXPERT Project Evaluating Hybrid Machine Translation

by Manuel Herranz

As recently published in our news section, Pangeanic is taking part in the EU-funded EXPERT Project.

EXPERT: EXPloiting Empirical appRoaches to Translation

EXPERT: EXPloiting Empirical appRoaches to Translation

EXPERT aims to train young researchers to promote the research, development and use of hybrid language translation technologies. In practice, EXPERT aims at improving translation practices and enhancing the productivity of relevant actors in the translation market. In this respect, EXPERT’s findings will set an agenda for new skills and jobs by promoting new job profiles based on empirical data from translation professionals (language service providers) and academia. The assumption of the project is that true potential of MT remains to be exploited as a result of non-user-friendly interfaces, lack of awareness of translator’s feedback, etc.

However, Pangeanic already created and released a web-based tool that is able to organize material for Machine Translation by domain, maintain it and perform some cleaning routines, a key factor in our participation in the project. This web tool is also able to directly create engines by domain or by TMs and perform several operations on training sets before engine training. Following a revolutionary concept, Machine Translation engines are created or updated depending on domains, and a few clicks can set in motion several actions to provide ready-for-use (S)MT.

The web tool already incorporates hybrid features (such as those presented at JTF in Tokyo, 2011), and these will be tested, expanded and improved upon in EXPERT.

Our role  within the 4-year project is to concentrate on results-driven testing of hybridization on the 6 official United Nations languages, carrying out a series of experiments on EN/FR/ES/ZH/ RU/AR. These will include general pre- and post-processing rules designed to improve machine translation output. For example, some tests will alter training sets and evaluate the impact of reordering in certain language combina­tions, measuring gains when using purely statistical, syntax-based or factorial models.

Pangeanic will focus on the automatic generation of bilingual written texts for multiple language combinations, alignment, segment cleaning and segment selection for bilingual engine building. We will also look at what hybridation language technology techniques need to be incorporated and im­proved to tackle re-ordering issues and other linguistic phenomena in non-related languages. When deal­ing with language-specific issues, we will also delve into automatic quality metrics and how these can correlate to human, non-objective qualitative appreciations.

Using our tool, users can check engine statistics (e.g. BLEU score, number of segments, number of words) and be­havior. For example, the engines can be used for translation and can be automatically updated with new or post-edited material, with further retraining possible via Pangeanic’s MT API or web. In this way we can measure the impact of new datasets and hybrid techniques over time on translation quality and the project will benefit from existing, state-of-the-art technologies.

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

   

Translation Technologies at LocWorld (Part 2: Practitioners)

I will describe the rest of the very interesting Preconference Day and presentations by the organizers (TAUS) as well as other 3 companies which are either machine translation developers or practitioners of automated translation solutions. Presentations brought different perspectives to the machine translation landscape, with efforts and advancements by several companies, including Pangeanic.

Maxim Khalilov from TAUS summarized the good work being done by the organization within the MosesCore project. His presentation was an invitation to visit and find out more about tools, data and resources. Amongst the tools, he mentioned other alternatives to Moses (Thot, for instance), as well as a collection of TAUS features on quality evaluation.

Important for new entrants or those with an interest in MT is the collection of data at TAUS: Europarl (1,8 million sentences), JRC-Acquis (270 paragraphs), Hansards, UN, OPUS, LDC Linguistic Data Consortium, ELRA, as well as TAUS’ own repositories, and many other features TAUS is developing.

Safaba, Udi Hershkovich presented his company’s effort to offer a commercial application translating not only documentation but all company communications, chat, etc., geared towards companies producing enormous amount of content in many different channels.

He touched upon problems encountered by developers when applying the technology in several areas. For example, when translating HTML: even when dealing with a perfect HTML, MT will translate keywords as well and one may not hit the perfect keyword necessary for another market.  This is an area for development and improvement for everyone.

An interesting point to many is that with Moses one needs a lot of data (or used to) but with other technologies you only need small sets of data to create MT. For example, one can look at similar translations, learn from them and then train engines.

Q&A time - Courtesy of Aspect Ukraine

Q&A Time – Courtesy of Aspect Ukraine

Lori Thicke from LexWorks presented a pragmatic approach to the application of machine translation solutions. Lori talked about having different engines and an agnostic approach to machine translation, depending on the language pair. For Lori MT is not a tool but a process, a process which needs to be understood in order to be utilized properly: one needs to understand training sets. Lori provided relevant ratings for “understandable” translations by developers like MS Translator, Systran Hybrid, etc translating online content.

Rahzeb and Max continued describing further EU-funded MosesCore projects by TAUS, such as the Quality Framework and how the consortium is working on quality assessment and measurement. I found this and later sessions by Lucia Specia and QTLaunchPad on post-editing very enlightening since they can offer an overview of tools developed to provide metrics. I believe that whilst resistance to adoption of computer-assisted tools in the 90’s diminished as the tools became desktop applications, this is also happening with MT.  Plus, metrics and also the ubiquity of machine translation will provide objective criteria to raise the credibility of different translation technologies and standard QA functions across the industry.

Users need to feel and be empowered in order to engage them and use the technology. That has long been  Pangeanic’s philosophy (see our presentation at Localization World Barcelona, 2011). It is only when the technology does not seem to be imposed from above, but perceived as “yet another tool to enhance productivity” that machine translation will stop being considered a threat and become mainstream.

But we better deal with it in our next entry. For now, I would like to invite you to visit our FAQ section.

Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions

   

How to build, run and own your machine translation ecosystem – Pangeanic at LocWorld London 2013

Pangeanic will exhibit its Pangea machine translation technology at LocWorld London 2013 inside ELIA‘s booth.  Over the 3-day event, you will have a chance to meet our representatives and see for yourself how PangeaMT works and how easy it is to create translation engines, manage them and update engines, clean and segregate training material and of course obtain translation in portable and open formats (from TMX and XLIFF to xml-compliant docx, odt, html and ttx).

LW_logo.fhd

Manuel Herranz is also a guest speaker at the pre-conference day, where he will speak as an experienced implementor of machine translation (MT) technologies at LSP and for large organizations with big publishing needs. Pangeanic was a founding member of TAUS, the industry think-tank and its spin-off the Data Association. Advancements in machine translation led Pangeanic to become the first language service provider to successfully apply Moses as recorded by the EU research program Euromatrixplus. The release of its ground-breaking DIY features in 2011 were “ahead of the times” and now part of many MT offerings. Manuel will focus his talk on how machine translation is not only being adopted by technical departments, but also by translators themselves, who see the opportunity of customizing engines as another helping tool in their jobs.

This is an abstract of the presentation:

Gone to the days when machine translation meant a desktop application, an off-the-shelf product or a post-editing job imposed by a translation agency.
Or aren’t they? If you work for/own/outsource translation work from an LSP, machine translation may still sound like something exotic, something only “the big boys” do, something terribly complicated based on unintelligible algorithms. 
Fear not. Since 2011, small agencies, translation departments and even freelance translators can create their own engines at will. Increasingly, the translator with a taylor’s complex is giving way to a data handler who can mix&match pre-built engines, add new material at will and service pre-translations for fast post-editing. We do not need to wait for technology to evolve from electrons to photons for fast MT customization. It is already here. Some people used to be translators, now they have MT Customization Specialist on the name cards
Next time you think languages, think Pangeanic
Your Machine Translation Customization Solutions
  

pangeanic

Machine Translation in Short

It is evident that certain documents require a human translator in order to interpret the subtleties of a language. Nevertheless, no matter how skilled a human translator may be, machine translation (also known as automatic translation or MT for short) exceeds the efficiency of a human translator.

Machine translation is generally used for subject-specific cases and this is where results and productivity rates are spectacularly higher. It allows individuals and companies to tailor their work according to the topic. Consequently, this enriches the output and quality of machine translation by cutting down on the number of choices for each word(s) to be translated.

This form of translation is extremely helpful in areas where formal language is used or phrases are repeated without much variation, such as administrative documents, which do not require the use of colloquial language and expression.

The potential of machine translation has been increasingly explored. In 2009, even President Obama mentioned that “highly precise automatic translation…could reduce the barriers faced in international commerce and collaboration.

Companies such as Microsoft are pushing this field to its forefront to create the most efficient forms of translation. Simultaneous-translation devices are being explored worldwide, ranging form London to Japan, where large mobile-phone companies like NTT DoCoMo, have introduced an apparatus that translates phone calls between English and Japanese, or Chinese and Korean. More about this form of technology can be read in a recent article in The Economist.

Although simultaneous-translation seems to be at the height of the translating industry’s innovation, machine translation remains an extremely sought after technology; Microsoft’s Translator API (application programming interface) alone attracts over 10,000 commercial users. Its increasing investment in this field may have to do with the accumulation of information on the Internet and the value of social media- for example Amazon, Facebook, and Twitter have integrated Microsoft’s Translator Hub into their websites.

Our machine translation division PangeaMT has been a leader in developing, fast-training and self-updating (DIY SMT) routines since 2011. This allows users to create small engines with their own material (TMX bilingual files) whilst profiting from the language coverage offered by larger engines – with a very rich set of quality features and functionalities.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz

Pangeanic Christmas Party… All for Translation Automation!!

Let’s change our machine translation and translation automation focus for once and share the happiness of Christmas period with everyone. All Pangeanic staff work very hard in all types of translation projects and translation consultancy so… it was time to celebrate!

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz