Several readers have pointed to the confusing terminology "LLM translation" and "NMT translation" in media, talks, and reports. Gartner and industry analysts, for example, only speak about neural machine translation, although technical solutions and users in general seem to move comfortably to the concept of "LLM translation" or "AI Translation" for either raw MT or customized, adapted MT. Pangeanic has been recognized in Gartner's Hype Cycle for Language Technologies for two consecutive years (2023 and 2024), where the technology is generally understood as "neural".
"LLM translation" and "NMT" are based on neural networks and, typically, the Transformers technology. The distinction is about specialization and training data, not the fundamental technology; thus, the terminology can be confusing. It is, in fact, a common source of misunderstanding, even among professionals. Understanding the difference between various machine translation approaches can be confusing because they all utilize neural networks, although they operate differently. The confusion stems from terminology overlap and underlying technical similarities. All modern machine translation is neural-based, but the key distinction lies in how these neural networks are trained and deployed. "Neural Machine Translation" (NMT) refers to task-specific, supervised neural models trained exclusively for translation using parallel data that directly maps source language to target language.
In contrast, "LLM translation" is performed by general-purpose neural models trained on a vast array of tasks including classification, auto-completion, NER, translation, summarization, question-answering, and more. While LLMs typically benefit from much larger datasets, they lack the specialized focus of dedicated NMT systems. This fundamental difference in training approach and specialization creates confusion when discussing "neural" translation, as both methods use neural networks but with significantly different architectures, training methodologies, and performance characteristics.
Neural Machine Translation (NMT) |
Large Language Models (LLMs) |
Machine translation systems are trained end-to-end on parallel bilingual corpora (source-target sentence pairs). |
LLMs are very large neural networks (e.g., GPT-4, Claude, Gemini, Llama, DeepSeek, PaLM) trained on huge monolingual and some bilingual/multilingual data, usually in a self-supervised way (predicting the next token/word). |
These models are always neural (e.g., encoder-decoder transformers), and were a significant step up from statistical MT. |
Translation is one of many tasks LLMs can perform, but even though they are neural models, LLMs are not specialized for translation. |
Examples: Google Translate (traditional NMT model), MarianNMT, Fairseq models.Typically translating whole sentences or several sentences (attention window). |
When you ask an LLM to translate, you’re leveraging its “generalist” capabilities, not a model optimized explicitly for translation. |
Neural Machine Translation (NMT) and Large Language Models (LLMs) are based on neural network architectures; their training methodologies and application domains have recently begun to converge. "Traditional" NMT models (Sequence-to-Sequence or Seq2Seq) are trained from randomly initialized weights using large-scale parallel corpora, with the objective of optimizing translation accuracy between source and target languages. This does not happen with general LLMs, which are trained to generate endlessly (and thus hallucinate). In recent years, however, the machine translation community has increasingly adopted strategies from the broader field of Natural Language Processing, particularly using pre-trained language models. This trend is motivated by the demonstrated effectiveness of models such as BERT, which are pre-trained on large amounts of monolingual data and fine-tuned for specific tasks. In the context of NMT, this approach is particularly valuable for low-resource languages, where parallel data is scarce. Models such as mBART exemplify this methodology by first undergoing multilingual pre-training to recover masked tokens and then being fine-tuned on translation tasks, thereby leveraging transfer learning to improve performance.
In parallel, large generative language models have shown the ability to perform translation without explicit fine-tuning, using prompt-based or “zero-shot” approaches. This has been evaluated systematically since 2023, including studies with GPT-3.5 and GPT-4. Research indicates that these models can produce fluent and competitive translations in high-resource language pairs, particularly when translating into English. However, results are less consistent for low-resource languages, a limitation attributable to the predominance of English and high-resource language data in the training corpus of these models. Benchmarking initiatives such as WMT23 have found that GPT-4 performs comparably to specialized NMT systems for certain translation directions, though domain-specific or low-resource scenarios continue to pose challenges.
In short:
Fun Fact: Did you know that many LLM scientists like Ilya Sutskever (ex OpenAI CTO) or Adian Gomez (Cohere CEO) began their days in machine translation? Gomez was part of the Google Translate team. Sutskever wrote and co-wrote several papers on machine translation and encoder-decoder technologies before joining OpenAI. This reflects how closely related both technologies are.
Alan Turing Statue in Sackville Park, Manchester (UK)
Further Reading: What is AI and what is Artificial General Intelligence? (AGI)
2023 will be remembered for the two methodologies that have emerged as leaders: Large Language Models (LLMs) and Neural Machine Translation (NMT) systems.
While both have revolutionized the process of automatic language translation, they present unique advantages and limitations. This blog post aims to provide a comparative analysis of these two techniques to help elucidate their role in the evolving landscape of machine translation.
Content |
LLMs, such as ChatGPT, utilize the potential of large-scale language models trained on vast amounts of text data. They excel in producing grammatically accurate translations and demonstrating a strong grasp of language structure.
Despite their generative versatility, large language models have certain disadvantages compared to traditional Sequence-to-Sequence neural machine translation.
Recommended Reading:
Generative AI: Privacy, Data Protection and How can it Affect your Business
NMT systems translate languages using artificial neural networks, such as Recurrent Neural Networks (RNNs) and Transformer-based models. They are known for their ability to consider broader context and manage idiomatic expressions.
Consistency: NMT systems often produce more consistent translations, especially across longer texts or documents.
Context fluency: NMT models translate sentence by sentence, which can be problematic in contextual languages such as Japanese or situations like conversations and dialogue. There may be gender or formality disagreements,
Struggle with Low-Resource Languages: Like LLMs, NMT systems can struggle to translate low-resource languages accurately due to limited training data.
Inflexibility: Once an NMT system has been trained, it can be difficult to adjust or adapt its performance without complete retraining, which can be costly in terms of time and resources. This was made easy and affordable by Pangeanic's Deep Adaptive.
It is important to note that although we have distinguished between LLMs and NMTs here for comparative purposes, there is a significant overlap between both technologies.
Many modern LLMs are created using neural network architectures similar to NMTs. The main difference lies in their training and use: LLMs are trained to predict the next word in a sentence and can generate text, while NMT systems are trained explicitly on bilingual text pairs to translate from one language to another.
LLM and NMT systems have their strengths and weaknesses. To get the most out of these technologies, it is essential to understand their respective capabilities and to choose the most appropriate approach depending on the context and the specific requirements.
While LLMs offer high adaptability and can generate grammatically correct texts, they may lack cultural nuances and have limited contextual understanding. NMT systems, with their strong contextual understanding and competent manipulation of language expressions, may require more resources and be less flexible.
In conclusion, both systems have made substantial contributions to the field of machine translation, each with its unique strengths and weaknesses. As the field continues to advance, combining these two techniques, leveraging the strengths of each, could pave the way toward even more accurate and nuanced machine translation systems.