All About Language Technology and Machine Translation With Maite Melero

Synopsis: Maite Melero is a senior researcher specialized in the field of language technology, machine translation and natural language processing. She is currently carrying out her research in both the Supercomputing Center of Barcelona and the Pompeu Fabra University. In this interview with Pangeanic, she gives us a glimpse of her life as a researcher, the present and future of language technology, and talks about minority languages. She also shares her thoughts on the coexistence between human and machine translators.

Interview With Researcher Maite Melero on Language Technology and Machine Translation

Maite Melero: "My dream is for technology to solve the difficulties of communication across language barriers and the survival of the planet's linguistic diversity."

Maite Melero is a senior researcher with a long history of academic and professional work in the field of language technology, machine translation and natural language processing. She is currently carrying out her research in both the Supercomputing Center of Barcelona and the Pompeu Fabra University.

After completing her studies, Maite Melero's professional career began with a pioneering European simultaneous translation project. It was the late 1980s, and this large-scale project was aimed at creating automatic translation systems between all the European Union’s languages (nine, at the time). Although she acknowledges that it was not very successful, "it served to lay the foundation for what is now natural language processing research in Europe."

In 1998, she travelled to the United States to work as a computational linguist. Among other things, she developed the Spanish grammar correction tool for Microsoft Word.

After that, back in Spain, she worked for several years in the field of natural language processing in a center linked to the Pompeu Fabra University. Among the projects in which she was involved, a particularly noteworthy initiative was developing "tools for extracting information from audiovisual material, both in the form of audio and images."

As of 2017, she has been involved in the Plan for the Promotion of Language Technology with the Secretary of State for Digitalization and Artificial Intelligence of the Ministry of Economic Affairs and Digital Transformation.

Since then, she has mainly devoted her work to machine translation. However, she also takes an interest in other language processing tasks while "trying to bring both machine translation and language technology in general into public administrations."

Language technology research: current and future

Maite Melero recognizes that natural language technology has evolved a lot "because the initial simultaneous translation systems were just collections of dictionaries and formal grammar rules. Text was analyzed at the morphological, syntactic and semantic levels." This involved rule-based systems that required linguistic intervention and were therefore limited and not very cost-effective.

In the 1990s, the first statistical systems appeared. They took advantage of the large amount of data available in digital format. In the field of translation, it meant "taking advantage of existing translations already performed by human translators. These were fed to the machine and it calculated what would be the most likely translation of a given word or sequence of words."

However, the real revolution came in 2014 with the application of deep learning neural networks. "These algorithms were not new, but they were beginning to be applied as the computational power increased at a tremendous rate. This had a real impact in many different areas, such as image identification, speech recognition and of course, machine translation."

This phenomenon has enabled important advances, such as large language models. "They allow for the construction of pre-trained models that can be adapted to any processing task, such as sentiment analysis, automatic summarization or commercial systems." 

For Maite Melero, the future of this technology holds many bright things: "I would say that we are still in the midst of this revolution. It is a very interesting time in which continuous improvements and developments are being made," although she acknowledges that there is still a gap between research and the market. "More emphasis needs to be put on technology transfer. Despite the fact that advances are moving rapidly, they take time to reach the places where we work, such as public administrations."

Minority or minoritized languages

According to Unesco, about 96 percent of the world’s 6,000 languages are spoken by only 3 percent of the world's population.

Among her many lines of research, Maite Melero focuses on the study of minority or minoritized languages (languages that have been marginalized, in many cases, because they have no political value). "These languages are difficult to work with because the technology needs large amounts of data to learn. The more data, and the higher their quality, the better the technology." 

In this regard, Maite Melero emphasizes that neural networks are complex systems that learn from immense amounts of data. "They are capable of internalizing general linguistic knowledge from many data of a given language, and then transferring this knowledge to languages for which little data is available." This is extremely important for minority languages because the languages that provide more data help those with fewer data.

Maite Melero tells us that one of the lines of research she is currently focusing on is unsupervised learning. "The system learns without seeing the translations beforehand," contrary to supervised learning where the system learns from previous translations.

European research on machine translation

The European NTEU project, for which Pangeanic led the consortium, focused on the development of near-human-quality machine translation engines. They are based on neural networks and work to and from all of the European Union’s 24 official languages. Maite Melero actively participated in this project and recognizes that "it was very ambitious because it aimed to develop language engines for the European Union’s 24 official languages. This meant a total of 552 machine translation systems and a goal to exceed the quality provided by Google for these pairs in more than 80 percent of the engines." 

 Another European initiative in which Maite Melero participated is the MAPA project on data anonymization for public administrations, for which Pangeanic also led the consortium. With this tool, contact data is automatically removed so that personal identifiers cannot be traced. This initiative "is a very useful tool because public administrations are obliged to protect personal data, according to data protection legislation. In this way, public service information can also be reused." The tools developed thanks to this project have been incorporated into the central linguistic services of the European Commission and are also used by the Spanish Ministry of Justice.

One of Maite Melero's concerns is related to gender bias in the data. This is a growing line of research in Europe because of the complexity of the issue. "The models learn from the bias that is already present in the data, but they also amplify it."

 

" From a technological point of view, an effort has to be made to rebalance the data that produces these biases."

The coexistence of human and machine translation

The coexistence between human and machine translation "is already present" for Maite Melero. According to her, it is "inconceivable to approach translation without the use of technology." The role of the translator will be redefined as "a validator, with the exception of those translations that by nature require the transcreation of the original content."