AI will read text to discover information for you

In the first of a series of interviews with Pangeanic's key R&D and technology product team, we discover the "inside" view of our developers and visionaries as experts. In this new interview, we talk to Manuel Herranz, CEO of the company to tell us about some of the new developments the company is working on.

Manuel, what are the innovations you are working on at NLP? 

There is one very good thing about working in Natural Language Processing and that is that the field is very broad. Within Artificial Intelligence, I would say that it touches on "very human" areas that we are all familiar with: language, its meaning, trying to solve its ambiguities and polysemies in machine translation, listening to us and helping us through personal assistants..... However, many challenges remain ahead of us.

Our mission is to combine human intelligence and technology so that people can extract information from data in a way that they otherwise could not.  Machine translation, in all its forms, is a challenge that we consider to be practically solved and where users can retrain their machines and create their own private translation "farms". Now, the question is, what do we do with all that information? We have to search, identify actors, places, dates, maybe even actions (verbs) and combinations among them. We want our AI to be able to read the text to discover information that humans no longer have time to read or process, and that includes, for example, discovering links that a group of law enforcement officers, lawyers, detectives or economists would struggle for weeks to find.

Why is this important and what impact does it have on society?

NLP is, at its core, a set of machine learning algorithms that understand and even write text. As I've said before, to me it's one of the most exciting areas of research within AI, it's where it's all happening, and we're heading towards a future where artificial intelligence for text (and voice) will be available as a service, which will accelerate the adoption of the technology.

Every day, without pause, we are confronted with hundreds of pieces of information that we can no longer process. When that information has to be meticulously reviewed, human resources and skills are limited and one of our major constraints is time. We are working to solve that challenge: NLP engines with models that can read and summarize long bodies of text, extracting key concepts, recognizing and identifying people who might be mentioned, and identifying patterns and structures that human readers might miss.

All this has, what I call, 2nd, 3rd and 4th degree implications, unforeseen implications: we will be able to know what a text, a document, a report, etc., is about in a foreign language without the need to translate it (the machine will do it as an internal process), but in addition, it will be possible to extract links between people and facts in a faster way. This can have very rapid repercussions in legal environments, better understanding forensic evidence, but also in police intelligence, for example. It can also allow us to "understand" large amounts of information from our past. Europe is very rich in historical repositories, archives, libraries for which it would take 100 lifetimes if we wanted to read and understand their contents.... And we would not find links. Let us take as an example the Archive of the Indies in Seville, and all the documentation on three centuries of Spanish presence in the American continent. There is a lot of cross-referenced information to be explored. In some European projects (EU) we are working to recover historical memory of the Jewish people, for example.

Speaking of third and fourth implications, let's imagine judicial processes, where evidence is available to prosecutors and parties much faster and with, let's call it, "industrial" metrics.  Those slower court processes can become more agile processes, which has an impact on the quality of life of citizens. AI, and within it PLN, are here to make our lives easier and better.

Can you give us other more concrete examples?

Of course, beyond the intelligence or legal field, a human resources company can preprocess CVs in other languages, even extracting key information, either with words or short abstracts. A financial firm could also use ECO to understand huge amounts of regulatory files in Chinese or Japanese, with their respective abstracts. The goal would be to identify potential risks for those companies that have exposure to the Chinese market, and I can think of many ..... 

I can't reveal more about our work, but it's no surprise that we will have pre-trained models and will also work with clients to customize models for specific tasks or specific data sets.

We are in the midst of a revolution with natural language processing at the moment and this means that a large number of the reading and writing tasks that still occupy us as humans are going to be able to be performed by machines for our benefit.

What other areas are important in your development?

Sentiment analysis. If we couple all of the above to a positive or negative rating, the summed information becomes very powerful. Let's cross-reference data from statements, for example, with a certain person's stock investments and find the positive or negative sentiment and trend of a listed stock. The words or deeds of certain people can have a very high relevance in the stock markets.

Do you have a last message, Manuel?

We are going to witness a new form of intelligence, relationships and behavior, new intelligences and insights, at a speed and scale that until now was neither possible nor accessible to us, simply because we could neither read enough nor find the deep relationships.