AGI? Scaling LLMs was never the path to useful Artificial Intelligence

Written by Manuel Herranz | 09/30/25

For the last few years, the world has been mesmerized by the promise of “Artificial General Intelligence.” Investors, governments, and executives have poured billions into projects under the assumption that scaling Large Language Models (LLMs) would bring humanity closer to (AGI). The result: only task-specific NLP solutions, agentic workflows and orchestration / engineering are going to work. And only the companies customizing solutions with the right data and often with on-premises solutions are making money.

At Pangeanic, we resisted labeling our work “AI” from the start. Not because it isn’t: our solutions are powered by neural networks, from machine translation, to many natural language processing (NLP), and advanced automation. We resisted it because we have been in language technology for decades and we understand the possibilities and limitations. We knew from experience that NLP requires engineering, linguistic expertise, and robust processes, not blind faith in scaling models.

The illusion of talking machines

The launch of ChatGPT 3.5 in late 2022 was a watershed moment, which is quite true. For the first time, the general public experienced a system that felt almost cognitive, and we humans felt our first cognitive experiences with a machine that was actually talking back to us and seemed to know a lot about many things. I boldly wrote an article for the Multilingual magazine entitled "Language is No Longer Human" and I still stand by it: we have machines that can generate language just as well as we humans can (a lesson in humbleness). What really separates us from other species (and devices now) is not our ability to communicate through language, but abstract thinking. If this is clear, then many of the tasks, functions, jobs (at risk), promises, investments, and R&D projects can be directed on the right path. There will be significantly less waste and a reduced need for massive power data centers to fuel LLMs acting as large general knowledge libraries with search capabilities. With the advent of more electric cars and clever (AI) home appliances, power generation is one of the businesses to be.

Apart from the power generation business, the money was always for a slice of the search market.

For those not technically in the know, the LLMs we are so familiar with were created by combining:

Transformer architectures originally built for language translation and classification,
massive monolingual training at internet scale,
books sourced from "many sources", as well as content from newspapers (you can see a breakdown of the initial training sets in our MONOLINGUAL DATASETS FOR LARGE LANGUAGE MODELS , up to version 3.5, when transparency on data acquisition stopped being the norm), and
a traditional chatbot enhanced with instruction-following abilities,

OpenAI created a tool that gave people the sensation of “talking to intelligence.”

But this was an illusion. Our brains do not distinguish between dreams, hallucinations, or machine-generated text. We projected human-like cognition onto software.

The result?

Users became addicted.
Venture capital exploded and some startups were funded. Even well-funded code generation failed. Builder.ai promised to automate coding and its AI was actually.... 700 Indian employees.
Management teams rushed into automation projects, even though less than 1% of them knew about artificial intelligence, let alone AGI. This led to unrealistic expectations and significant failures in PoCs (proof-of-concept) throughout 2023 and 2024. CIO reported that 88% of AI pilots fail to reach production.
People began trusting LLMs with business decisions, personal relationships, and even financial investments.

The cognitive experience was powerful. But it was engineering cleverness, not AGI.

The hard data: 88% of AI PoCs Fail to reach production

By 2024, the excitement was colliding with reality. Both McKinsey and Gartner reported that over 80% of AI Proof-of-Concepts failed. CIO.com could not put it any better: "Unclear objectives, insufficient data readiness, and a lack of in-house expertise are sinking many AI proofs of concept. So too are zealous POC greenlighting and misguided pressure from the top.

The proof of concept (POC) has become a key facet of CIOs’ AI strategies, providing a low-stakes way to test AI use cases without full commitment.

But as enterprises increasingly experience pilot fatigue and pivot toward seeking practical results from their efforts, learnings from these experiments won’t be enough — the process itself may need to produce more targeted success rates.

Recent research from IDC, undertaken in partnership with Lenovo, found that 88% of observed POCs don’t make the cut to widescale deployment. For every 33 AI POCs a company launched, only four graduated to production, IDC found.

“The high number of Al POCs but low conversion to production indicates the low level of organizational readiness in terms of data, processes and IT infrastructure,” IDC’s authors report. “Half of the organizations have adopted Al, but most are still in the early stages of implementation or experimentation, testing the technologies on a small scale or in specific use-cases, as they work to overcome challenges of unclear ROI, insufficient Al-ready data and a lack of in-house Al expertise.”

Analysts tracking generative AI have found a similar pattern, noting a strong desire among companies to leverage Gen AI, but worries about the various errors that make it difficult to take the technology to the next level."

I presented these figures at TAUS Albuquerque and ValgrAI in Valencia (the video below is in Spanish, with the PowerPoint presentation in English - it is easy to get subtitles in English in Youtube).

The reasons for failure were clear: unrealistic expectations. Executives assumed LLMs could:

classify and extract information flawlessly,
translate across domains with cultural nuance,
or act as “co-pilots” in any professional workflow.

But LLMs are limited by what all statistical models are limited by: training data.

The cultural dimension: LLMs beyond English

Governments quickly realized another weakness: the cultural dominance of English-trained models. Efforts multiplied to create localized LLMs in Arabic, Catalan, Chinese, and African languages.

This was a positive shift. LLMs can indeed help with cultural preservation. But expecting them to replace professions or, as some claimed, to write 90% of the world’s code in six months, was pure hype.

Business Insider, March 2025: "Anthropic's CEO says that in 3 to 6 months, AI will be writing 90% of the code software developers were in charge of".

If there is one truly exponential phenomenon today, it is not productivity: it is the rapid doubling of AI-generated misinformation.

The engineering reality everyone ignored

However, my message is not that LLMs are useless. They are highly valuable as knowledge repositories and can be relied upon for some tasks and personal use. But at a massive scale, they are costly and inefficient. They don't solve specific problems. And when they do, they do so at the cost of privacy.

For example, translating and reconstructing just one page using Python XLM libraries may consume 1–1.2 million tokens (input and output)—before terminology verification or translation memory matching.

That is not scalable for AI translation companies.

Engineering requires more than personal anecdotes of ChatGPT success. Just as an engineer would never build a bridge because a cement mixer worked well, serious NLP involves system design, verification, and domain expertise.

Pangeanic’s perspective: NLP first, hype later

We have never claimed that Deep Adaptive AI Translation is a “translation killer.” Not even as a "translator killer" (some misinformed translators have vented their opinions about this and AI taking their jobs. I completely understand their anger. I did not understand the anger of translators who refused to use translation memories in the 1990s because they wanted to avoid repetitions. This sounds like a joke nowadays, but it is. What does the "language technology" world need? AI Translation—built on NMT (LLM Translation is a kind of NMT, with much longer contexts and thus fluency), refined with task-specific adaptation, and grounded in linguistic and engineering expertise.

From the beginning, we recognized that the AGI narrative was unrealistic. Hallucinations persist. General-purpose models cannot be controlled. Only task-specific models, such as classification, sentiment analysis, and translation, deliver consistent results.

This is why at Pangeanic we focus on:

Smaller, specialized systems and models that can be controlled,
Multidisciplinary engineering teams,
Practical NLP applications that deliver measurable value.

Takeaways: Bet on reality, not illusion

The AGI dream sold well to investors and the media, but it was always a mirage.

The winners in this industry will not be those chasing speculative AGI milestones. They will be the companies building practical, engineered NLP solutions that solve real problems today.

At Pangeanic, that has always been our path: AI as engineering and language technology—not illusion.

For us at Pangeanic, Basque, a language spoken by only around half a million people and with scarce data, represents the ultimate challenge and inspiration. It shows that even the most complex, isolated languages can thrive in the modern world with passion and dedication. It is a living link to our European past, and its survival is a victory for the shared cultural heritage of all humanity. We not only offer human Basque translation services, but also Basque machine translation, even with our Deep Adaptive AI Translation.

View full post