In the first of a series of interviews with Pangeanic's key R&D and technological product team, we discover the "inside" opinions of our developers and visionaries as experts.
In this interview, we talk to Manuel Herranz, CEO of the company to tell us about some of the new developments being worked on.
There is one really good thing about working in Natural Language Processing and that is the fact that the field is very broad. I would say that AI touches on "very human" areas that we are all familiar with: language, its meaning, trying to solve its ambiguities and polysemies in machine translation, listening to us, and helping us through personal assistants... However, many challenges remain.
Our mission is to combine human intelligence and technology so that people can extract information from data in a way that they otherwise could not. Machine translation, in all its forms, is a challenge that we consider to have practically solved and where users can retrain their machines and create their own private translation "farms."
Now, the question is, what do we do with all that information? We have to search for and identify actors, places, dates, and maybe even actions (verbs) and combinations among them. We want our AI to be able to read the text to discover information that humans have no time to read or process, and that includes, for example, discovering links for groups of law enforcement officers, lawyers, detectives, or economists.
NLP is, at its core, a set of machine learning algorithms that understand and even write text.
As I have said it before, to me, it is one of the most exciting areas of research within AI. It is where it is all happening, and we are heading towards a future where artificial intelligence for text (and voice) will be available as a service that will accelerate the adoption of the technology.
Every day, without exception, we are confronted with hundreds of pieces of information that we can no longer process. When that information has to be meticulously reviewed, human resources and skills are limited and one of our major constraints is time.
We are working to solve that challenge:
NLP engines with models that can read and summarize long bodies of text, extract key concepts, recognize and identify people who might be mentioned, and identify patterns and structures that human readers might miss.
All this has, what I call, 2nd, 3rd, and 4th-degree implications.
Unforeseen implications: we will be able to know what a text, document, report, etc., is about in a foreign language without the need to translate it (the machine will do it as an internal process). In addition, it will be possible to extract links between people and facts in a faster way.
This can have rapid repercussions in legal environments, like a better understanding of forensic evidence, but also in police intelligence, for example.
It can also allow us to "understand" large amounts of information from our past. Europe is very rich in historical repositories, archives, and libraries and it would take us 100 lifetimes to read and understand all their contents. Let us take as an example, the Archive of the Indies in Seville, all the documentation of three centuries of Spanish presence on the American continent. There is a lot of cross-referenced information to be explored.
In some European projects, we are working to recover the historical memory of the Jewish people, for example.
Speaking of third and fourth implications, let's imagine judicial processes, where evidence is available to prosecutors and parties much faster and with, let's say, "industrial" metrics. Those slower court processes can become more agile processes, which has an impact on the quality of life of citizens. AI, and within it NLP, are here to make our lives easier and better.
Of course. Beyond the intelligence or legal field, a human resources company can preprocess CVs in other languages, even extracting key information, either with words or short abstracts. A financial firm could also use ECO to understand huge amounts of regulatory files in Chinese or Japanese, with their respective abstracts.
The goal would be to identify potential risks for those companies that have exposure to the Chinese market, and I can think of many...
I can't reveal more about our work, but it is no surprise that we will have pre-trained models and will also work with clients to customize models for specific tasks or specific data sets.
We are in the midst of a revolution with natural language processing at the moment and this means that a large number of the reading and writing tasks that still occupy us as humans are going to be able to be performed by machines for our benefit.
If we couple all of the above to a positive or negative rating, the summed information becomes very powerful. Let's cross-reference data from statements, for example, with a certain person's stock investments, and find the positive or negative sentiment and trend of a listed stock. The words or deeds of certain people can have very high relevance in the stock markets.
We are going to witness a new form of intelligence, relationships, and behavior, new intelligence and insights, at a speed and scale that until now was neither possible nor accessible to us, simply because we could neither read enough nor find the deep relationships.