Throughout society, data impacts every aspect of our lives, whether we are aware of it or not. Each day 306 billion emails are sent, 95 million photos or videos are uploaded, and 26 billion IoT devices are in circulation. (Source: World Economic Forum).
By 2025, the amount of data generated each day is expected to reach 463 exabytes globally (that’s 463 followed by 18 zeros). In fact, the growth rate is so fast that in the last two years alone, 90% of the world’s data has been produced.
The implications are clear. Data is the new fuel that powers enterprise, and it has quickly become a critical asset for managing operations and unlocking growth. With this explosive growth comes both risk and opportunity that should be understood and managed.
Because data is used to understand behavior and develop more effective products and services, it’s being collected, analyzed and shared like never before. It is also being hacked, stolen, and illegally sold. Fines are being levied, consumer confidence eroded, and information flow between organizations can become constrained as regulations impose new standards and expectations.
Simply put, having more data everywhere correlates with ever-increasing breaches of Personally Identifiable Information (PII) and increased risk that requires tighter controls, oversight, and management.
Unfortunately, seven million data records are compromised every day, and on average only 5% of companies’ folders are properly protected. (Source: Varonis) According to 2021 research by DLA Piper: GDPR Data Breach Survey, January 2021, there was a 19% increase in the number of breach notifications, from 287 to 331 breach notifications per day over the past year, continuing the trend of double-digit growth for breach notifications. There have been more than 281,000 data breach notifications since the commencement of GDPR in 2018.
Despite having security measures in place, large corporations continue to report personal and financial data being exposed or stolen. It is often necessary for organizations to go beyond physical protection of data and to mask or change the data to ensure there is no traceability back to the real person. These technical techniques for removing identifying information are known as “anonymization”.
Without anonymization, when data is stolen or compromised, sensitive personal information is revealed. Following are just a few examples of the companies that have incurred fines, leading to eroded customer confidence and share price declines.
Data Privacy is a branch of data security concerned with the proper handling of data. In general, anonymization is an aggressive means to identify categories of information that should be protected, then alter the data so that even when hacked or subsequently processed by a third party, an individual’s identity or other private information are not revealed. Anonymization happens when global identifiers or secondary data is identified and removed so that the data is less likely to be traced back to an individual. This means that PII is replaced with gaps (deleted), with placeholders (substituted with a category label) or pseudo anonymization (replaced with another name).
There are a range of features that can be included to train the system for specific terminology or customize how it is applied, enabling greater efficiency and utility in managing PII.
Pangeanic has long been a leader in using Artificial Intelligence (AI) to automate repetitive tasks, beginning over 10 years ago with statistical then neural machine translation for the language translation industry. The underlying technology — used to identify and replace words, train neural networks on custom terminology, and provide intelligent assistance to digital workflow — is combined with our standard multilingual NLP framework to provide a uniquely valuable solution.
This kind of thought leadership and success has resulted in Pangeanic leading the MAPA Project – the EU’s first multilingual anonymization platform. Along with our partners, this strategic initiative makes extensive use of bilingual encoders for transformers to identify personal identifiers (e.g., names and surnames, addresses, job titles and functions) and establishes a deep taxonomy, which results in outstanding configurability and performance our experience includes deploying similar solutions for financial institutions, legal and public sectors, and enterprise IT.
Pangeanic is announcing an exciting new Anonymization service: Pangea Masker. This product is designed to provide an intelligent, trainable resource to identify and replace private information along digital workflows. It is available as
Business and communication often happens across regional and language barriers, so Pangea Masker is integrated with a multilingual platform, making it easy to add in other Natural Language Processing (NLP) services available from Pangeanic, such as machine translation, eDiscovery, summarization and classification.
For more information, or to schedule a demo, Contact Us.