1 min read

Using Linguistic Information for Machine Translation Hybridation

Pangeanic’s development team gathered important input on advances on machine translation hybridation from leading academic at the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011) and the practical Saturday session ML4HMT (META NET WP2) in conjunction with DKFI. The sessions were geared for development and R&D personnel and those with a personal interest in making use of the best of research on MT and its different flavours to improve current state-of-the-art systems. Attendants and presenters were leading academics from the US, European Union and Japan involved in different MT areas. The theme was state-of-the-art developments in combinations (often involving Moses) and hybridation of rule-based approaches with statistics. Sessions dealt with combined approaches using syntax, grammatical information, rules and statistical systems. As different research teams are facing the same problems worldwide, some similar, some new some quite imaginative approaches are beginning to emerge, for example:
  • Lemmatisation, annotation for morphologically-rich languages, for example Czech and Basque and even lesser resources in the case of the 2nd one.
  • Syntax-based approaches and word re-ordering for very unrelated languages (such as Asian or Semitic languages into and out of European languages)
  • Web-based annotation tools
  • Hybridisation of techniques, starting from analysis at a morphological layer, then analytical layers, tectogrammatical layers, and then transfer, and on to synthesis to t-layers, a-layer and m-layer.
  • Word disambiguation
  • Mixture of rule-based and statistical approaches to improve predictability.
  • Post-editing effort estimation for MT systems and with systems including no linguistic features or having some. Linguistic features are relevant for direct useful error detections and for automatic post-editing. But for sentence-level CE there are issues with sparsity and with representation (length bias).
  • New metrics like VERTa, using linguistic knowledge organised in different levels (lexical, morphological, syntactic information and sentence semantics)
[caption id="attachment_888" align="alignleft" width="150"] Carnegie Mellon University's Alon Lavie and Manuel Herranz exchanging views on hybridation Carnegie Mellon University's Alon Lavie and Manuel Herranz exchanging views on hybridation[/caption]
 

Related Posts

What Are the Best Machine Learning Algorithms for NLP?

What Are the Best Machine Learning Algorithms for NLP?

Nowadays, natural language processing (NLP) is one of the most relevant areas within artificial intelligence. In this context, machine learning algorithms play a fundamental role in the...

Read more

NLP Techniques: The Most Powerful Natural Language Processing Methods

In today’s digital world, companies have access to large volumes of data, data that must be understood in order to deliver a better customer experience, to increase their competitive edge, to streamline internal processes, and more. 

Read more

Pangeanic in the Top 100 LSPs and the Importance of Adapting to the Times

As every year, Slator, a leading provider of news, research, consulting and market intelligence for the translation, localization, interpreting and language technology industry, has conducted its ranking of language service providers (LSPs) around...

Read more