AI in Translation and Beyond: Key Takeaways from TAUS Albuquerque 2024

NEWS BLOG EXPERT MACHINE TRANSLATION ARTIFICIAL INTELLIGENCE

13:40

As a leading AI company with a mission on knowledge management and retrieval and multilingual dissemination with AI translation, Pangeanic always keeps an eye on the latest developments in our field. The recent TAUS Albuquerque 2024 conference provided valuable insights into AI's current state and future in language technology. Many of the conversations focused on the higher levels of automation that can be achieved in producing reliable translations. Artificial intelligence applied to the translation industry, its advantages and disadvantages, and the possibilities and limitations of the technology dominated much of the presentations. Here are our key takeaways.

The Current State of GenAI

Anthony Scriffignano's opening talk highlighted that while Generative AI has made significant strides, humans don’t think in tokens and don't communicate in tokens in an auto-regressive mode, so our experience with AI is still far from human-like intelligence. GenAI excels at pattern recognition and specific problem-solving but struggles with basic reasoning tasks. However, there may be many areas where it can solve specific problems. This aligns with our perspective at Pangeanic: AI is a powerful tool, but human oversight and judgment remain crucial.

Example: If 5 seagulls landed on 5 spaces on a parking lot, an algorithm probably will decide that the next seagull will land on a free spot in a parking lot (such is the power of data). Recognizing carts one on top of the other may lead to the idea that carts can jump or park themselves, not that someone else has helped to stack one on top of another.

Multilingualism and AI

Marina Pantcheva from RWS shared an interesting observation: teaching an LLM to reason in one language (e.g., English) and then adding another language (like Swahili) can help it generate answers in the new language. This highlights the importance of developing truly multilingual AI systems, a goal we're actively pursuing at Pangeanic.

Kalika Bali, Senior Principal Researcher at Microsoft Research India, explained her life mission in the development of speech applications for low-resourced languages in India. She emphasized the need for AI to support a broader range of languages, including those from the Indian subcontinent. At Pangeanic, we recognize the importance of linguistic diversity in AI development for a more multilingual AI: our mission is create AI-powered custom knowledge retrieval experiences for deep insights and multilingual dissemination.

The Future of Translation

Marco Trombetti, CEO at Translated suggested that we shouldn't limit ourselves by comparing AI to human intelligence. Instead, we should focus on how AI can enhance our lives and work. At Pangeanic, we share this vision, always striving to develop AI solutions that complement and augment human capabilities.

Trombetti also pointed out the significant potential in the translation market, especially for social media and audiovisual content. This aligns with our mission at Pangeanic: to make high-quality translation more accessible and affordable.

Language-Agnostic AI and Trustworthy Knowledge

This was one of the most interesting sessions. Jochen Hummel (forever the creator of Trados, who has transitioned to knowledge management in the way of knowledge graphs with his company Coreon) raised essential questions about the economics and efficiency of current LLM use. He stated that "We could easily transform our terminology into multilingual knowledge resources". Jochen epitomizes the transition the language industry needs, from managing sentence-level translation to multilingual knowledge management. The discussion touched on biases in language models, particularly favoring English knowledge. We're addressing this challenge at Pangeanic with our ECOChat system. Driven by user data only, ECOChat provides equal access to knowledge from trustworthy, user-configured sources and data repositories (internal o external). Jochen wondered if we have plateaued in efficiency.

His session included comments from Kirti Vashee and the difference between the bias towards English knowledge so that a question about the same person will give you two different answers: more detailed in English and completely unusable in Korean. This questions the experience in other languages and by users who are not English speakers. It is precisely the problem that ECOChat solves: equal access to knowledge from trustable sources, actually, the sources that the user configures with its own data and knowledge repositories, so the answers reply only with the information from the user and not generally available Internet data.

Kirti Vashee highlighted the focus on scaling up, but raised concerns about how this affects individuals in the Global South who lack access to unlimited NVidia GPUs. Jochen stated that "humanizing LLMs" is probably irresistible as people find them. Kalika replied that one of the best uses of "current AI" is as a brainstorming tool, as an assistant. Hadar Shemtov from Google said that it would depend on what one thinks about humans: is the LLM the new intern or the authoritative university professor? An LLM is a large correlation engine. This may fool people into thinking it is intelligent. It has the advantage that computers have versus humans, just like when playing chess, it can explore endless moves and possibilities. (Playing chess was always one of the first challenges AI pioneers tried to solve, from Torres Quevedo to Alan Turing). Jochen continued by asking if we can trust an LLM to make statements about its own quality. Kirti replied that indeed, the technology can respond with human-like fluency, but it is not reliable. Hallucinations can be minimized but not eliminated completely.

In the final session the next day, Arle Lommel from CSA Research and I pondered whether it is reliable to have individuals evaluate the quality of their own work. Jochen mentioned automatic post-editing as a trustable feature. Kalika noted that LLMs tend to think they are better than they are (like humans, funny enough!). They cannot be good assessors of their own output. Hadar jokingly mentioned the challenge of training an LLM "with no ego." One idea might be to imitate what humans do and use the best human translator as the final proofreader. Jochen highlighted the model proposed by ECOChat, which involves having knowledge in one language and then translating it. Hadar explained that some models can curate knowledge from several languages and create the best available answer. He referred to these systems as "end-to-end." According to Kalika, consistency is a major issue, and much of the current research is focused on addressing this challenge. People can tolerate occasional errors and inaccuracies, but inconsistency is far less acceptable. Hadar proposed that models excelling at a single task will likely be the most successful. He suggested that learning transfer might occur, with speech translation potentially being an area where such transfer takes place.

AI in Education and Accessibility

Olga Beregovaya from Smartling highlighted how AI is used to support less popular languages and assist in education. In South Africa, for example, AI is helping students learn in their native languages, improving overall language proficiency. At Pangeanic, we're excited about AI's potential to break down language barriers in education and beyond.

Olga presented some striking statistics: out of the global population of 8 billion, 3 billion are monolingual. Additionally, 75 million people worldwide have a speech disability. He pointed out that AI assistants from large tech companies often struggle to understand speech from people with disabilities. Oscar Straker's demonstration of VoiceITT showcased how AI can assist people with speech disabilities, an often-overlooked application of language technology. This aligns with our belief at Pangeanic that AI should be used to create more inclusive communication solutions. Focusing on the United States, Oscar advocated for the 5% of US children who experience some kind of disability for at least 12 months. To demonstrate the challenges in speech recognition, he showcased a live vocabulary training session. In this demonstration, the system initially misrecognized the word "Albuquerque" as "Albert Cookie," highlighting the need for improved AI understanding of diverse speech patterns."

The Changing Landscape for Language Service Providers

Gráinne Maycock from Acolad noted that smaller LSPs might struggle with increasing client expectations for sophisticated technological solutions. At Pangeanic, we see this as an opportunity for collaboration and innovation in the industry. Her statement sheds light on a pressing issue in the language services industry as it grapples with the rise of AI and machine translation. Smaller Language Service Providers (LSPs) are finding themselves in an increasingly precarious position, caught between the expectations of clients for more sophisticated technological solutions and the rapid advancements in AI-driven translation capabilities.

The challenge for these smaller LSPs lies in their limited ability to differentiate themselves in a market where basic translation services are becoming ever more commoditized. As larger LSPs and tech companies invest heavily in AI and machine translation technologies, smaller providers often lack the resources to keep pace. This technological gap is creating a divide in the industry, with many smaller LSPs struggling to maintain their relevance and profitability.

Wrapping Up TAUS Albuquerque - The Experts View

I shared a panel together with Arle Lommel from CSA Research and Language Technology Expert Anne Huehls, chaired by my admired Mark Seligman from Spoken Translation. The program said that our session would close, wrap up the conference and deal with "what does this mean for our day-to-business? What can we take away from this TAUS conference and recommend to the attendees as work-to-do before we meet again next year? The panel consists of market watchers, researchers and consultants." I'm happy TAUS can classify me as a researcher, a consultant and an industry watcher (or insider).

The future for language companies likely involves a significant shift in their business models and service offerings. Many will need to consider partnering with other LSPs or technology providers to offer the level of sophistication that clients now expect. This may include completely new workflows in which the convergence of MT and TM make may sense at some level, or just go for revolutionary new ways to provide language services. In any case, this is already leading to a wave of consolidation in the industry, with mergers and acquisitions becoming more common as companies seek to pool resources and capabilities. A week after TAUS Albuquerque, independent TMS provider MemoQ has acquired Globalese (machine translation company) and Proprio has acquired United Language Group.

Times ahead are not for the faint-hearted. However, this challenging landscape also presents opportunities for those LSPs willing to adapt and innovate. The key may lie in specializing in niche markets or offering high-value services going beyond basic translation. This could include focusing on industries where human expertise is still crucial, such as legal or medical translation with the help of AI, or offering services like transcreation and cultural consulting that require a nuanced understanding of language and culture. Some may follow Jochen Hummel's and Pangeanic's and build solutions for multilingual knowledge management and multilingual dissemination.

Moreover, language companies that can position themselves as expert partners in navigating the complex world of cultural adaptation and post-editing machine translation at scale, AI training data, data curation and data annotation for language models, or offering consulting services on when and how to effectively use machine translation.

The future of language services in an AI-driven world is likely to be one of hybrid solutions, combining the efficiency of AI machine translation with the nuance and expertise of human linguists. Successful LSPs will be those that can seamlessly integrate these elements, offering clients comprehensive language solutions that leverage the best of both human and artificial intelligence.

The TAUS Albuquerque 2024 conference reinforced many principles guiding our work at Pangeanic. As we continue to develop our AI solutions for knowledge retrieval, dissemination, and translation, we remain committed to creating technology that is multilingual, trustworthy, and accessible. The future of AI in language technology is bright, and we're excited to be at the forefront of these developments.

Stay tuned for more updates on how Pangeanic applies these insights to create cutting-edge AI solutions for language and knowledge management.

News: Pangeanic’s Deep Adaptive AI Technology Revolutionizes Translation for BYD AUTO JAPAN