Pangeanic Presents Its Innovative Approach to Anonymized Text Translation at EAMT 2023

Written by Marisol Letelier | 06/14/23

We attended the European Association for Machine Translation (EAMT) 2023 event, where our CTO Konstantinos Chatzitheodorou presented a paper he wrote together with MªÁngeles García, Head of Machine Learning and Carmen Grau, Machine Learning engineer. In his presentation, he introduces a workflow that combines machine translation and human editing to achieve the accurate translation of anonymized texts. As a leading company in Artificial Intelligence and Natural Language Processing solutions, our main goal with this approach is to reconcile the conflicting needs of data privacy and translation quality. 

 

 

There are several challenges that must be faced when translating anonymized texts. Anonymization can lead to lack of context, loss of useful information related to cultural or historical references, use of non-standard language or jargon, and complex grammatical and syntactic structures in certain languages. In addition, cultural differences between source and target languages can also lead to inaccuracies in translation. 

Our workflow combines machine translation and human editing to achieve accurate translations while protecting sensitive information. Key steps include pseudonymization, where we replace the sensitive data in the original text with alternative information that maintains the necessary context. We then apply machine translation to the pseudonymized text to generate a preliminary translation, which is reviewed and corrected by professional translators at the human editing stage. Finally, we replace the pseudonymized entities in the edited text with their original versions from the original text. 

Our workflow is flexible and can be adapted to various machine translation frameworks, and its architecture includes alignment components and integration with computer-assisted translation tools. In addition, we have shown that this approach improves translation quality and reduces human editors' workload. Its combination of human expertise and machine translation is particularly valuable in sensitive sectors such as healthcare, banking, or law. 

The effectiveness of our workflow was evaluated using subjective and objective measures. In the subjective study, 14 participants evaluated different pseudonymization options, and we found that pseudonymized text and coding using tags were considered the most appropriate options. However, we identified certain issues that required automated downstream processes and addressed them. In the objective evaluation, participants evaluated different post-editing alternatives with machine-translated entity replacements. Pseudonymized text received higher ratings compared to numeric code or tag substitution, as the pseudonymized entities better preserved the meaning and characteristics of the original text. 

To validate the quality of the translation, five professional translators post-edited the pseudonymized versions into Spanish and German, with subsequent replacement of entities. The results indicated that pseudonymization produced correct and accurate translations, preserving the original meaning, while other anonymization options sometimes introduced grammatical problems or were misleading. 

While our workflow improves translation quality and reduces the human editors' workload, it also presents some risks, such as machine translation inaccuracies and errors in the automated alignment process.

These findings provide valuable insight that will help improve and refine our methodology in future research. 

Pangeanic's presentation at EAMT 2023 highlights our commitment to machine translation innovation and our ability to address the challenges that data privacy and translation quality pose in anonymized text translation processes. This promising approach offers a solution for organizations seeking to achieve both data privacy and high-quality translation processes. 

 

 

We are delighted to have had the opportunity to be silver sponsors of the prestigious EAMT 2023 event.

Taking part in this international gathering of the machine translation industry has been an extraordinary experience. We were able to share our knowledge and collaborate with countless professionals and experts in the field. The event provided us with an invaluable platform to present our innovative solutions and make meaningful connections with leaders and companies in the field of machine translation. We are honored to have contributed to the success of the event and look forward to continuing to collaborate to boost the growth and innovation of this ever-evolving industry.