PangeaMT Masker
Featured Image

4 min read


463 Exabyte Reasons to Start Anonymizing Now

Throughout society, data impacts every aspect of our lives, whether we are aware of it or not. Each day 306 billion emails are sent, 95 million photos or videos are uploaded, and 26 billion IoT devices are in circulation. (Source: World Economic Forum).

By 2025, the amount of data generated each day is expected to reach 463 exabytes globally (that’s 463 followed by 18 zeros). In fact, the growth rate is so fast that in the last two years alone, 90% of the world’s data has been produced.

The implications are clear. Data is the new fuel that powers enterprise, and it has quickly become a critical asset for managing operations and unlocking growth. With this explosive growth comes both risk and opportunity that should be understood and managed.

Because data is used to understand behavior and develop more effective products and services, it’s being collected, analyzed and shared like never before. It is also being hacked, stolen, and illegally sold. Fines are being levied, consumer confidence eroded, and information flow between organizations can become constrained as regulations impose new standards and expectations.

Simply put, having more data everywhere correlates with ever-increasing breaches of Personally Identifiable Information (PII) and increased risk that requires tighter controls, oversight, and management.

Unfortunately, seven million data records are compromised every day, and on average only 5% of companies’ folders are properly protected. (Source: Varonis) According to 2021 research by DLA Piper: GDPR Data Breach Survey, January 2021, there was a 19% increase in the number of breach notifications, from 287 to 331 breach notifications per day over the past year, continuing the trend of double-digit growth for breach notifications. There have been more than 281,000 data breach notifications since the commencement of GDPR in 2018.

Despite having security measures in place, large corporations continue to report personal and financial data being exposed or stolen.  It is often necessary for organizations to go beyond physical protection of data and to mask or change the data to ensure there is no traceability back to the real person. These technical techniques for removing identifying information are known as “anonymization”.

Without anonymization, when data is stolen or compromised, sensitive personal information is revealed. Following are just a few examples of the companies that have incurred fines, leading to eroded customer confidence and share price declines.

  • Yahoo! exposed 3 billion user accounts and paid $117M in direct penalties and costs and lost $350 million in market capitalization.
  • In the financial sector, Equifax had 145 million usernames, birthdates, and social security numbers stolen resulting in $575 million in fines and over $4 billion in total costs.
  • Marriott International had 389 million hotel guest records stolen and took a $126 million accounting charge.
  • In health insurance, Anthem Health Insurance had 79 million member records stolen resulting in a $16 million HIPAA fine.

Six Critical Reasons Why An Organization Should Anonymize Datasets:

  1. Protect consumer privacy:
    US and international regulations now require protection of a consumer’s privacy. Violation of a regulation may lead to fines and a negative impact on a company’s reputation.
  2. Minimize the risk of fines or loss of market value. If anonymized data is stolen or compromised it lacks private information so will have minimal value to those wanting to resell it in illegal data markets.
  3. Safely analyze operational data. Anonymized data can still be analyzed internally, providing a safe resource to increase sales and operational efficiencies.
  4. Reduce the risk of reputational harm.  Data breaches of anonymized data don’t reveal PII and when hacked may reflect more positively on a company because it took the extra steps to protect their customer’s privacy.
  5. Monetize data assets. Anonymized data unlocks your strategic value as it can then be safely sold, opening up new revenue streams and monetization strategies.
  6. Improve communication efficiency.  AI driven anonymized communication across organizations helps automatically filter out PII, so leads to a more efficient flow of relevant information.

Anonymization: a necessary resource to protect PII

Data Privacy is a branch of data security concerned with the proper handling of data. In general, anonymization is an aggressive means to identify categories of information that should be protected, then alter the data so that even when hacked or subsequently processed by a third party, an individual’s identity or other private information are not revealed. Anonymization happens when global identifiers or secondary data is identified and removed so that the data is less likely to be traced back to an individual. This means that PII is replaced with gaps (deleted), with placeholders (substituted with a category label) or pseudo anonymization (replaced with another name).

There are a range of features that can be included to train the system for specific terminology or customize how it is applied, enabling greater efficiency and utility in managing PII.

How is Pangeanic’s new Anonymization service so uniquely valuable?

Pangeanic has long been a leader in using Artificial Intelligence (AI) to automate repetitive tasks, beginning over 10 years ago with statistical then neural machine translation for the language translation industry. The underlying technology — used to identify and replace words, train neural networks on custom terminology, and provide intelligent assistance to digital workflow — is combined with our standard multilingual NLP framework to provide a uniquely valuable solution.

This kind of thought leadership and success has resulted in Pangeanic leading the MAPA Project –  the EU’s first multilingual anonymization platform. Along with our partners, this strategic initiative makes extensive use of bilingual encoders for transformers to identify personal identifiers  (e.g., names and surnames, addresses, job titles and functions) and establishes a deep taxonomy, which results in outstanding configurability and performance our experience includes deploying similar solutions for financial institutions, legal and public sectors, and enterprise IT.

Introducing: Pangea Masker

Pangeanic is announcing an exciting new Anonymization service: Pangea Masker. This product is designed to provide an intelligent, trainable resource to identify and replace private information along digital workflows. It is available as

  1. a SaaS subscription,
  2. can be integrated with existing platforms using an API, or
  3. can be easily deployed on-premise. The Pangea Masker utilizes deep adaptive learning capability that trains the AI system for unique terminology or domains. Neural engines can then be cloned and managed across complex specialized language environments.

Business and communication often happens across regional and language barriers, so Pangea Masker is integrated with a multilingual platform, making it easy to add in other Natural Language Processing (NLP) services available from Pangeanic, such as machine translation, eDiscovery, summarization and classification.

For more information, or to schedule a demo, Contact Us.