Data Protection and Anonymization in the Context of Financial and Legal Services

The GDPR obliges organizations, companies and institutions to protect citizens’ data so that they are not used by third parties and to minimize the data being released. On the basis of this obligation, INEA (Innovation and Networks Executive Agency) has awarded almost €1M to Pangeanic's. This award is aimed at developing a multilingual anonymization toolkit for European public administrations based on the AI processing of texts in the fields of health, life sciences, and justice. The question is: To what extent are we equipped to comply with the GDPR in the financial and legal sector when the personal data of our clients are handled on a large scale?

Personal data protection goes beyond simply seeking to avoid situations in which other unauthorized persons access the private data of an individual. It is about people’s basic right to have their privacy protected against the violations of such rights that may arise from their personal data being collected or stored by companies or Public Administrations.

Personal data can be defined as any numerical, alphabetical, graphical, photographic or acoustic information, or information of any other kind, relating to an identified or identifiable natural person. In other words, any information that either reveals data about a specific natural person or that, through said information, could lead to the identification of the person, is classed as personal data. This includes paychecks, addresses, telephone numbers, email addresses, photographs, medical check-ups, names and surnames, religious beliefs, political views, bank account details and, in some cases, professions and family relations, etc. However, data such as the Tax Identification Number of a legal person is not considered personal data.

There are certain kinds of data that, due to being particularly sensitive, affect people’s privacy. According to the law, data related to a person’s ideology, trade-union membership, religion, beliefs, ethnic background, health and sex life is considered especially sensitive. For these kinds of data, it is necessary for businesses and government to implement special security measures.

Example of text anonymization software using the redaction technique

Example of text anonymization software using the redaction technique

With the aim of providing greater security for any company that handles clients’ personal data on a large scale, the anonymization of personal data through the use of new Big Data techniques, which in this context takes on an inestimable value, comes into play.

Here we must differentiate between the concept of absolute anonymization, which is very difficult to achieve as there is always a risk of being able to match up the information in question with the person to which it alludes through the information crossover, and the more practical term “de-identification”. Taking into account that anonymization can never be absolute, it must at least be ensured that the re-identification that may arise due to the information crossover or “traceability to the data subject” entails such a high degree of effort that it is not realistically feasible for anybody to try to retrieve it. That is what we are currently working on in our technology division, PangeaMT.

The MAPA (Multilingual Anonymization toolkit for Public Administrations) project makes use of state-of-the-art Natural Language Processing tools such as named-entity recognition based on neural networks and bilingual encoding to develop an open-source solution with a focus on the medical and legal domains, deploying it at several Public Administrations in Europe, including Spain, France, Latvia and Malta.

“The aim of MAPA is to provide a private data anonymization service so that such data can be shared among organizations while protecting private or confidential data. Implementation cases will focus on de-identifying, obfuscating or pseudo-anonymizing personally identifiable information. Thanks to artificial intelligence (AI), it doesn’t matter what language the Public Administration or other users deal with; there will always be a solution. MAPA will allow PAs to comply with the GDPR with a high degree of certainty and protect the private data of a person while retaining the practicality of Big Data.”

MAPA will meet GDPR requirements at scale. Although no software can guarantee 100% accuracy, it will make the exchange of documents much quicker and more dynamic and efficient. It will also ensure compliance with both the GDPR requirements and the law. Have companies in these sectors such as financial institutions, insurance firms, consultancies and law firms considered how complying with the GDPR would facilitate the implementation of these anonymization tools?

Based on this knowledge, at Pangeanic we have working on converting anonymization into a tool that can be implemented not only in the public sector, but also in the private business domain, bringing it closer to sectors such as law and finance, where enormous quantities of clients’ personal data are regularly handled, making anonymization a real need.