TAUS Massively Multilingual Summit Dublin 2025: Reflections from Google’s European HQ

NEWS DATA BLOG EXPERT MACHINE TRANSLATION LANGUAGES

13:09

The TAUS Massively Multilingual AI Conference in Dublin on June 10-11, 2025 proved to be a pivotal moment for our industry, bringing together leaders from AI labs, globalization directors, and language technology providers at Google’s European headquarters. As someone who has been developing language technologies since 2009, I found this gathering particularly significant for its focus on the convergence of AI and multilingual accessibility.

The Data Collection Reality Check

The opening keynote by Dinesh Tewari from Google DeepMind set an ambitious tone, discussing massive multilingual AI projects across languages and modalities in his native India. His work capturing the speech landscape of a country with 1300+ languages highlighted a crucial challenge we face: the dependency on large-scale manual data collection for low-resource languages.

During our first panel discussion, this became a central theme. Our admired Maite Melero from Barcelona Supercomputing Center, with whom we’ve worked together in so many projects (iADAATPA-MT Hub, NEC TM or MAPA among others), shared fresh insights from the AINA project and the Salamandra multilingual models her HPC Center has built, with an emphasis in low resourced co-official languages of Spain: Catalan and Catalan dialectal varieties, Basque, Galician and Spanish proper. Salamandra model has been trained in over 35 languages. Maite stressed community-powered data collection for Catalan and how BSC launched a media campaign and TV call via local broadcasters, which resulted in thousands of hours of speech and text collection.

Ricardo Rei from Unbabel brought an interesting perspective on balancing performance between translation-specific tasks and general-purpose capabilities in their Tower LLM. This speaks to a broader challenge we’re all grappling with—how to create models that excel in specialized domains without compromising broader functionality.

I shared insights on quality metrics and quality estimation for MT together with Ricardo. “We are fine-tuning specific small language models for translation and other NLP tasks (anonymization, document classification, and summarization in a different language, speeding the creation of Knowledge Graphs…). With ECOChat, we have an intelligent bot for knowledge management and multilingual information retrieval with a flexible, agentic configuration (multi-agent). ECOChat allows for non-localization (or “no translation”) as an organization's knowledge becomes fully multilingual. A lot of our current work also goes to custom data collection for speech systems and low-resource text collection.”

The Infrastructure Paradigm Shift

One of the most thought-provoking moments came during the “Redesign of Enterprise Globalization” panel hosted by Jochen Hummel from Coreon. Richard Tunnicliffe from Google noted how our interaction with content is shifting from traditional screen-based navigation to agent-driven interfaces that digest, translate, and deliver content adaptively.

This led me to pose one of the conference’s most provocative questions according to Arle Lommel and Julie Belião: “Do we even need to translate websites anymore?”

The response was immediate and profound. Julie Belião captured it perfectly: if agents become the primary interface, we stop translating static pages and start adapting data structures, service responses, and interaction flows on-demand, per user, per context.

This doesn’t eliminate the need for multilingual infrastructure—it moves it lower in the stack and makes it more technical. The delivery layer must still internationalize content, handle placeholders, respect locale-specific logic, and validate dynamic output across modalities.

Beyond Translation: Knowledge Management

Manuel’s core thesis, which resonated throughout the conference, is that the translation industry has always been fundamentally about two things:

• Understanding a message

• Delivering a message

Whether it’s consuming knowledge or spreading knowledge, humans have traditionally been the interpreters. Now, with LLMs (and we deliberately avoid calling them “AI” in the broader sense), we’re seeing a shift toward automated interpretation with varying degrees of human oversight.

The uncomfortable truth our industry must confront is this: there are many cases—documentation, help systems, legacy knowledge—where localization ROI simply doesn’t justify traditional translation workflows. Perhaps the future lies not in translation but in knowledge management?

This perspective aligned with observations from CSA Research’s Arle Lommel, who noted that with agentic AI logic, personal agents will interact with and parse data in ways specific to individuals, making translation just one of many possible interactions.

Quality and Risk Management Evolution

The conference reinforced that quality estimation and automated post-editing are improving, but truly generic models still underperform compared to those with specific training data. As Amir Kamran noted, analyzing false positives and false negatives remains critical for risk minimization.

The notion that translation quality is fundamentally a risk management function has moved from controversial to accepted wisdom. However, our industry is still working through the implications.

Paul McManus from Google articulated this well: Quality equals Consistency. If we can eliminate variance in our content processes and produce consistent output reliably, this should be considered high quality, even if it doesn’t meet traditional human-led gold standards.

The Talent and Industry Evolution

Mark Jones from Comtec Translations captured an important shift happening over the past 12-18 months. The role of Language Service Providers is evolving toward Global Content Service Providers (GCSPs), requiring enterprise-grade capabilities and technology agnosticism.

Esther Curiel’s (Zoetis) concept of “frictionless” solutions resonated with our approach at Pangeanic—enabling teams to work autonomously and adopt technology more quickly while maintaining cultural intelligence.

The AI revolution is changing the talent landscape significantly. Success requires combining linguistic expertise with technology proficiency, strategic communication skills, and adaptability—qualities our industry has developed over decades.

The European AI Landscape

An interesting tension emerged around European AI initiatives. Ricardo Rei’s involvement with both EuroLLM and OpenEuroLLM projects raises questions about whether we’re seeing genuine European AI leadership or simply replicating the centralization patterns we’ve criticized in U.S. models.

The recent loss of U.S. federal funding for Mozilla’s Common Voice project, despite its crucial role in speech diversity, highlighted the fragility of open, public-good AI data initiatives.

Contest Winners and Innovation

The TAUS innovation contest showcased practical applications of multilingual AI:

Winner: Ziyi (Ian) Zhang from Zoetis presented Project Babel Fish, addressing a unique challenge in veterinary clinical consultation reports filled with shorthand, typos, brand names, and local vet terminology. Their low-touch terminology list approach—curated, scalable datasets helping AI achieve contextually appropriate translations—demonstrates practical innovation in specialized domains.

Runner-up: Bruno Bitter’s Blackbird platform impressed with its real-time demonstration of repurposing raw content into multilingual, multimodal assets using tools like Fireflies, Airtable, and RAG-based glossary injection.

Moving Forward: Infrastructure Over Translation

Julie Belião’s closing observation was particularly insightful: if agents become our primary content interface, access remains the goal, but the path now runs through infrastructure, training data, and architecture rather than traditional translation workflows.