“Language is no longer human” - A review of the Multilingual magazine article

BLOG EXPERT NLP SOLUTIONS MACHINE TRANSLATION ARTIFICIAL INTELLIGENCE LANGUAGES

Multilingual magazine has recently published an article I wrote at the end of last year. “Language is no longer human” is both a statement and a thought-provoking exploration of what I consider a turning point for the translation industry in transforming into a true "language industry".

The Oxford English Dictionary defines industry as the "economic activity concerned with the processing of raw materials and manufacture of goods in factories." That is precisely what the translation/localization industry has never done. Compared to commonly used terms such as "the automotive industry," "the pharmaceutical industry," or others such as media, marketing, broadcasting, food processing, or video games, the language industry doesn't cut in general terms. It is not considered "an industry," and it is often classed as "services." It certainly does not manufacture anything. We could say it processes language versions and creates parallel corpora, as a result. And that would be about it.

"Language is no longer human" discusses how our traditional understanding of language as something unique to humans, semi-divine in nature, linked to reasoning and knowledge, has polluted the way the $65Bn translation industry has traditionally been conceived by many. Large translation companies began applying economies of scale and technology since the advent of translation memory servers, statistical machine translation, stepping up a gear with neural machine translation that allowed for Adaptive MT, and now GenAI applied to translation processes (with automatic post-editing entering the scene in 2024).

Applying human-quality GenAI to finalize a first machine translation will transform us into a true industry with standard industrial processes similar to manufacturing.

AI will make the language industry a true "industry"

Ironically, the concept of "industry" as defined by OED has not applied to most of the 14,000 translation companies worldwide. The fact that this industry is so fragmented is evidence of the craft mentality that prevails in it - and also in translation buyers and many translators. Let's face it: machine translation was a dirty word until quite recently, and humans have never been too good at languages past a certain age, as the article discusses. Our brains are not wired to translate and master several languages. Some people can, but despite the number of bilinguals in the world, being fluent in more than 3 languages is not common.

If we all understand the term industry as “processing of raw materials and manufacture of goods in factories”, then words are both our raw materials and our manufactured goods. Language companies (translation companies) are factories. Words are produced by humans as raw materials. They are typically further processed by other humans (using CAT tools or QA/QC software) and offered as a final product.

Innovation has typically come in the way of connectors to CMS to collect content from the web, plugins or machine translation, as a means to speed translator’s productivity. In 2024, this is not enough.

Celebrating the fact that machine translation is part of AI and that adaptive machine translation proves we are somehow “part of AI” is a shallow analysis. True, many new developments in LLMs and AI in general involve people like Ilya Sutskever (ex-OpenAI) or Adrian Gomez (Cohere) who were part of Google Translate’s team at one point. Incidentally, a patent on NMT was turned down by the European Patent Office in 2023. As innovation goes, a $50-65Bn translation industry has nothing to celebrate except the fact that language technology companies have been using Transformers since 2017. The industry, in general, has been extremely slow in adopting technologies that typically have been developed outside of it. Most professional LSPs are waiting to decide on where and how to bet their money, undecisive about NMT, looking at Adaptive MT as a half-way solution, and still insisting that LLMs are "not ready for production" or "are too expensive".

This time, the AI hype is different

This time is different because for the first time in history, we humans are having cognitive experiences with machines. I spend most of my time evangelizing clients on this point: we had some experiences with Virtual Reality glasses over the last few years. We knew the images were not real, we knew they were avatars of things that don't exist, but we can't help being amazed at how realistic a game looks, how real a Medieval city looks... some people even scream or fall from their armchairs as they live a rollercoaster experience. Our most basic brain activity cannot distinguish the real from the unreal. If we see something, it is happening. Humans have been trying to interpret dreams for millennia, sometimes thinking that if something was dreamt, it would happen –that there was a reason for the dream. Our more advanced mammal brain can reason "this is not real" or "this has been written / translated by a machine" in a second or third line of thought. But the truth is that we humans are not very good at detecting what is real and what is not, and it is becoming increasingly difficult to do so.

From late 2022, machines produce human-grade language

In reality, machines had been producing human-grade language for some time. Translations from machine translation (MT) systems have been very, very good for some time, and some MT post-editing has become part of many translation companies’ offerings for quite a while. Well-known figures like Microsoft’s Christian Federmann or Kirti Vashee from Translated argue that NMT is an extremely good solution and state-of-the-art. I agree with both because NMT is sufficient in a large number of cases for the language industry in its current station. I spoke about this in my earlier post Neural Machine Translation versus Prompting-Based LLM Translation - How Close Are We?

But this time is different. The push is to devise a process with as few humans as possible.

LLMs are just a part of the puzzle for General Artificial Intelligence (AGI), and they are just a part of Generative AI (GenAI) systems. however, they’ve had a profound impact on our traditional understanding of language as a uniquely human attribute. With the ability to generate, interpret, and process language, AI systems, particularly through large language models (LLMs), have begun to perform tasks that were once thought to be the sole domain of humans, such as translation, content creation, and conversational interactions. This shift not only redefines our relationship with language, but also has profound implications for various industries and professions. This heralds a new era of automation and AI-augmented communication, including of course AI-based translation services where humans may not be in the loop any more, but become quality controllers, working on cultural adaptation, adding world knowledge. They become proof-readers. ISO 17100 (reviewed in 2020) and ISO18587 (post-editing as it was in 2017) are completely obsolete.

A career in industry and language

Previously to joining the language industry, from the mid 90's to early 2000s, I spent quite a few years working as a commissioning engineer in the automobile industry and large engine industry (aero and marine for co-generation). I also had a passion for languages and literature, semantics, the historical evolution of language families, and even interlingua! I enjoyed using language skills, helping manufacturers open new combustion engine production plants or installing RB-211 aero and submarine engines to produce 27MW of electricity and changing people's lives with more stable access to electricity.

I was frequently called to meetings beyond my expertise and age at the time simply because I spoke languages. I found a common theme at many of these meetings: users could not understand the instructions on the manuals of these large machines, the user interface was meaningless, etc.

I could not understand the poor quality control, the lack of automation in processes, or the generation of high-quality translated content. My job was to manage the commissioning of machines that created other machines (cars) with quality control on the inputs and final products. Surely, something could be done. My heart was on having the technology that would automate language processes and quality control. "Cars have some 8,000 parts and each one is manufactured, QC is run, and the car is assembled. There must be a way to automate language production at scale." I used to think.

Language as an industrial process

Drawing on historical, technological, and sociological perspectives, my article argues that AI's ability to communicate in human languages challenges the notion of language as an exclusive hallmark of human intelligence.

This point is illustrated by highlighting the inherent advantages of AI in overcoming language barriers at scale and in many language combinations and communicating nuances (at least in well-resourced languages). The technology now has the potential to democratize language production and accessibility to information.

Where do we go from here?

"Language is no longer human" addresses the limitations and challenges humans face in mastering new languages. This is exemplified by the story of Henry Kissinger, who kept his native German accent throughout his life while his brother adopted an unmistakably American accent.

I am pointing out the advantages of AI in overcoming such limitations, while at the same time I want to raise critical questions about the essence of our uniqueness as human beings. I suggest that reasoning, creativity, and emotional intelligence will continue to differentiate humans from machines and other mammals, allowing us to face complex social, ethical, and creative challenges in ways that AI, at least for now, cannot replicate. However, the production of linguistic content that requires little or no world knowledge or cultural adaptation is about to become an industrial process with batch quality control by humans. And all of us who work in machine translation know that this will be the case very soon, as adapted LLMs review initial machine translation inputs.

I believe in AI's potential to match and, in some cases, surpass human abilities in language and communication. As CEO, I work every day to stimulate critical thinking and validate my vision with a process in which machines will translate, post-edit automatically, and provide a quality estimation of the content produced (mostly to a human-grade level).