How has 2020 affected the language industry?
Slator reports that the industry, as a whole, has remained unaffected and some large players have grown. CSA in Boston and other think-tanks reported a fall in orders as lockdowns meant a standstill in many industries, even in Germany.
My take is that higher process automation and machine translation will benefit those organizations that have adjusted well to WFH policies. This is obvious: we are all speeding up the transition to remote work and by 2024 is likely that three quarters of all work will be remote. Surely, the world was already moving in that direction anyway, but the Covid-19 pandemic has added momentum to it.
Another trend we can't ignore is security. We can’t assume that moving our work or our children's studies to the cloud will not have any impact on sensitive data. Our concern has translated in joining the very small club of ISO27001-accredited companies. This ensures our clients their data is handled and kept meeting the highest standards of security and, increasingly, anonymization services as an added layer of data security.
So 2020, despite all the negative things we all know, has brought 3 positive things: an increase in digitalization, an increase in automation and AI as drivers and a higher concern about data privacy.
Simplicity in User Experience is one more point to be taken into consideration. The increasingly large numbers of workers moving to cloud tools (private or public clouds) need to be able to start working right away and with little training so the solutions with intuitive and clear minimalistic interface will definitely win.
Has any breakthrough in machine translation appeared?
Breakthrough technologies are defined as life-changing, or from the business point of view, technologies that shape economic systems. For instance, Airbnb, Amazon, and other innovators created not only businesses but the new perspective division of labor. I am more inclined to think that some technologies will soon reach higher levels of maturity. For example, Natural Language Processing solutions will mature with more market solutions, from text and document classification to anonymization/data masking.
Others will be reborn, for example, the new generation of blockchain technologies is likely to give us a way to transfer loads of operations with documents to the digital form, and that will change the work with documents forever.
Another interesting technology that we are looking at is artificial intelligence (AI). We hope to start working on implementing it in 2021. I am convinced that the revolution we are about to witness will demand an increase in productivity in a magnitude of x10 times whilst prices for services are likely to fall more or less the same.
AI can improve human productivity and automate human routines, creating the conditions for higher intelligence with documents so humans can focus on added-value tasks. AI applied to language can offer actionable insights that would otherwise take hours, days or weeks to complete.
In this respect, our machine translation has seen new usage peaks via our MT API in 2020. We have completed some exciting information extraction projects with financial institutions.
What trends, services or markets will Pangeanic focus on in 2021?
We are already discussing different types of integrations for anonymization (data masking) and I’m also ready to predict that people will be choosing cloud software and rejecting tools that aren’t available in the cloud. I think it’s only natural.
However, when it come to anonymization, on-premises solutions will prevail. In 2021, Pangeanic will be focusing a lot in integrations. On-premises installations will work for anonymization dockers as organizations seek to offer more privacy and add layers of security but overall, cloud services are the future.
How has 2020 changed PangeaMT technology?
The world and technologies are evolving very very fast. New algorithms, advances in language technologies, evaluation methods, annotation methods and so forth are published practically every week.
Previously, we wanted to create our own tools for everything and offer our clients a whole package, a round solution, but now we are focusing on language technologies and mostly machine translation and anonymization, with information extraction on a second level.
What’s your vision for Pangeanic’s future? Are you leaving some technologies or areas behind?
We are moving away from government-sponsored projects to bring into the commercial arena many of the things we have learnt in the last few years. As for MT functionality, we don’t keep it secret that we have added some features that make Deep Adaptive a choice for organizations looking for custom MT even with small amounts of data.
True, we have changed and updated the core neural networks that have created over 500 engines for the EU in direct language combinations. It has been a learning path that has automated many processes. The feel and look of ECO, our NLP platform has improved and will change to more intuitive looks, too.
Human-machine interfacing with humans improving machine translation output will become more and popular. That means a lot more post-editing, making software work for linguists in several "human-in-the-loop" scenarios.
We don't believe in on-the-fly trainings. As developers, we know that one sentence, or one hundred have little impact on training, and that it is better to retrain once a certain critical mass has been achieved. This is even more so when one wants to customize anonymization solutions with particular items.
Our tool will have integrated editors to add new entities that users want to anonymize. The value of data will become even more clear, human data that can improve systems.