Try our custom LLM Masker
Featured Image

2 min read

28/11/2022

Multilingual Anonymization With Pangeanic's Masker

Compliance with data protection regulations has become a major concern for companies around the globe. The law requires all types of documents and databases to be anonymized. With this in mind, an important question arises: in which languages can it be done? Nowadays, personal data is an essential component of companies’ day-to-day operations, as it is crucial in decision-making processes. Nevertheless, it is important to respect the privacy and data protection of the individuals involved in order to comply with the law, as well as maintain their trust. Anonymization, an increasingly popular term, has emerged as a technique for addressing this issue. If you handle personal data and want to avoid fines and comply with the data protection regulations of the countries in which you operate, anonymization is the answer to all your problems. Companies seek and develop constant improvements to facilitate using and accessing data while complying with all security regulations. The law requires personal data to be retained while it is necessary for providing the relevant services. Once it is no longer needed, and after the minimum period of data retention, the sensitive information can be removed, or in other words, anonymized. These days, most companies have security measures in place, yet many have suffered security breaches involving personal data and, as a result, monetary losses for non-compliance with the regulations of the country in which they operate. Each country applies its own rules, which is why multilingual anonymization is an essential tool in order to comply with data protection regulations around the world.

Having an anonymization tool that can identify personal data in the source language and hide it correctly is a must for any company that handles any volume of data. Pangeanic is a leading natural language processing company specialized in anonymization software, near-human quality private machine translation, automatic data classification, relevance and sentiment analysis, and summarization. Our journey began in 2005 as a translation service provider, and after that, we began to implement technologies and grow as a language technology and natural language processing company. We have spent years working on the development and implementation of improvements to our anonymization software called Masker, which allows global organizations to comply with the different data protection regulations around the world (GPDR, CCPA, HIPAA, APPI).

Our platform has been multilingual since the very beginning as we specialize in translation and, therefore, are familiar with the nature of providing multilingual services. How do we do it? All this is possible because we use machine learning based on the Transformer model. This structure is composed of attention models that learn which parts of the sentence are most important and need to be considered for anonymization. Masker provides an array of different languages in which data can be anonymized. The latest model we have trained and fully optimized is in Japanese. This is an important language to take into account, since Japan has the APPI (Act on the Protection of Personal Information). This law applies to all business operators (individuals and entities) that handle personal information, and is a national equivalent to the other existing laws mentioned above.   Masker Pangeanic