Working with aggregate data: what needs to be taken into account?

Data analysis is an effective practice in research, in the prediction of behaviors and trends and, consequently, in decision-making in any sector; business, commerce, science, education, government, etc.

In carrying out this statistical and strategic practice, truly relevant information can be extracted. This information can only be provided by aggregate data, a kind of average that is obtained from the individual data.

But what happens when these aggregate data come from personal data, from specific data that are identifiers of individuals, from very sensitive data. Exactly what needs to be taken into account?

Definition of aggregate data

To better understand the applications and risks of aggregate data, it is important to be familiar with the definition, how they are formed and their relationship with individual data and anonymous or anonymized data.

Aggregate data are a set of information compiled based on the average or summary of a group of individual data. Their purpose is to make comparisons, study and forecast trends, and obtain information of global importance.

This group data may or may not contain numbers, as well as possibly leading to the processing of personal data.

Normally, when aggregate data need to be used, they first go through an anonymization process. A method in which information that can serve as an identifier of the individual is removed.

You might be interested in: How to protect your data with data masking

Aggregate vs. disaggregated data

As we have seen, data aggregation involves summarizing and compiling a group of individual data. But the much-needed practice of disaggregation also exists.

Disaggregated data come from the disaggregation or separation of the information units that make up the aggregate data. Therefore, these disaggregated data can be defined as the components that structure the aggregate information.

Aggregate data are essential for understanding and compiling important information, showing trends and predicting behavior. However, disaggregated data are indispensable for identifying certain underlying patterns.

Disaggregated data reveals trends that are not visible to the naked eye in the aggregate data.

For example, in country X, the aggregate average of male individuals between 30 and 50 years of age suffering from severe stress is 35%. But disaggregating this information reveals that only 5% of men who live outside a city suffer from stress.

As seen in the definition of aggregate data, these are usually limited to providing general, large-scale patterns or behaviors. However, it is necessary to break them down in order to study factors or to detect problems, or problems that may arise, i.e., more specific characteristics.

Usage and applications of aggregate data

Aggregate data provide valuable information. And, normally, they are used to establish the basis for making important decisions for a company, an institution, a certain population... For example:

Financial analysts use aggregate data to identify the general inflation rate.
Banks collect customer data, anonymize it and use it for economic estimates or to identify trends in a particular user sector.
Governments can use these data to identify the results or effectiveness of a measure implemented in a country or city. They help them to consolidate or plan new strategies.
Educational campuses use aggregate data to determine school performance.

Risks of aggregate data

As explained above, when aggregate data are derived from personal data, care must be taken in the use and disclosure of such data for protecting individuals' privacy. For this reason, the practice is to anonymize personal data before aggregating it.

However, there are data which, by their nature or the format in which they are presented, cannot be anonymized, or if anonymized can, by applying matching techniques, re-identify the individual who owns them. Examples of these are video or voice data and sensor data.

In addition, aggregate data may contain certain very specific data about the individual that, if desired, can reveal private habits and preferences which the owner has not consciously shared.

Many times, if appropriate technological measures for protection are not taken, artificial intelligence algorithms can discover relationships and trends that can undermine the right to privacy when processing such data.

This is why it is of utmost importance that every company, organization, association or government institution has a trusted aggregate data company that provides the effective technology to ensure the protection and proper use of the data.

Only an aggregate data company with experience in the processing of information using advanced technology can offer high quality in the data aggregation process for studies, planning, development and other activities.

Contact us to ensure the integrity, privacy, reliability and anonymization of data in the context of content processing and translations. At Pangeanic, we offer you advanced technological solutions that comply with data protection and the GDPR.