Empirical Approaches to MT and Post-Editing

Pangeanic is a key participant in the EU-funded EXPERT (EXPloiting Empirical appRoaches to Translation), which aims to train young and experienced researchers in the application of the latest, state-of-the-art machine translation techniques and also to promote the research, development and use of hybrid language translation technologies. Our approach to hybridization at Pangeanic is to produce rules learnt from statistical facts usually in post-processing modules. This is known as “normalization” and we consider it a more flexible approach than a purely rule-based approach since any developments can be learnt much faster and the concept and application is more flexible. The EXPERT Project is based on the belief that a number of developments in EBMT and SMT have already shown the potential of corpus-based MT approaches to produce fast and low cost translations. This has significantly increased output in a data-hungry society producing massive amounts of data and, on the other hand, reduced human effort, time and costs. Nevertheless, adoption of machine translation technologies and the role of the translator as a post-editor is still far from being a reality. The main reason for this is that machine translation tools are not designed to aid professional translators. Some of the shortcomings of machine translation technologies are improductive or even unfriendly user interfaces, often lacking awareness of general translator's feedback, or particular post-editing, etc. Therefore, we are convinced that the great potential of the new tools remains to be fully exploited. Translation Memory systems long reached their best and whilst they proved to be a useful technology, little breakthroughs have been brought about in their core philosophy since the mid-90’s. They suffer a number of well-known limitations, mainly their poor performance when they deal with new texts that have not been translated before (no match). Surprisingly, the view that target users of these two types of translation technologies is mutually exclusive or non-overlapping has resulted in little research towards exploiting the integration of these technologies in order to provide better solutions. At EXPERT, we advocate that there is no clear boundary between supposedly fully automatic translation (machine translation) and what we call semi-automatic translation (CAT-based translations). Instead, we consider that both variations are equally good tools to help humans (professional translators and end-users) to produce high quality, reliable, fast and economic translations. EXPERT will accommodate the requirements of different types of users, by prioritizing its research according to evidence about the needs and problems encountered in real-life conditions by users of translation technologies, including both professional translators and translation “consumers”.