What Is Bias and How Do We Perceive It?

DATA BLOG

The term bias is defined as a distortion of judgment in favor of or against something or someone. There are various subgroups and types of biases, such as gender bias or cultural, political, statistical, or cognitive biases, etc.

Bias in the field of Artificial Intelligence

In the field of AI, this occurs when algorithms produce unfair or discriminatory results due to erroneous assumptions in the machine learning process. These AI systems are created by humans, so they often reflect different personal and social characteristics, such as religion, race, gender, etc.

According to researchers at Penn State College of Information Sciences and Technology (IST), in the field of natural language processing, which encompasses various applications, such as spam filters or virtual assistants, there is an implicit bias that negatively influences people with disabilities. This study was conducted by exploring 13 public models, according to Pranav Venkit, one of the researchers involved.

The phenomenon even affects large companies. As reported by Reuters, Amazon has been building software for years to review job applicants' resumes and automatically find the most suitable candidates. The company's recruiting tool uses artificial intelligence to analyze candidates, scoring them from one to five stars.

In 2015, the company realized that the model they were employing had a certain gender bias. The reason is that it was trained by observing patterns in CVs sent to the company over a 10-year period, most of which belonged to males.

This phenomenon also occurs in the field of machine translation. There are examples that tend to assign gender to professions according to those who most often occupied the positions in the past. Although machine translation engines have improved, we can still observe examples of biases such as the following:

Bias can also occur in automatic classification. This was the case with the Kaggle platform, which organized a competition to rank reviews based on their toxicity score. As a result, it was found that the models classified non-toxic comments as toxic. The reason is that comments that often refer to minority groups such as "feminists," "Muslims," "blacks," "gays," etc. are classified as toxic, even though they are not toxic in and of themselves.

Can bias be eliminated from AI algorithms?

As Dr. Sanjiv M. Narayan of Stanford University School of Medicine noted, "All data is biased. This is not paranoia. This is a fact."

Eliminating bias altogether is a complicated task, but some of the steps to be taken towards correcting or mitigating it in AI systems consist of exploring the algorithm and the data. For example, it must be determined whether the training data set is representative enough. By observing the modeling process, biases can be identified and the reasons for their occurrence can be understood.

In addition, consideration should be given to the processes in which it is optimal to use AI and those in which it is preferable to involve humans. Tasks such as research in this field are also essential. AI models are created by people and each person has a different vision and values, and therefore, biases that he or she acquires throughout his or her life. Diversity implies taking into account a wide variety of visions. The fact that one person does not detect the presence of bias does not mean that another person will not either. This was the case for computer scientist Joy Buolamwini, who discovered the presence of racial bias in facial detection systems by using them on her own face.

Pangeanic and bias

Pangeanic offers several services in the field of artificial intelligence, including

- Automatic classification.

- Machine translation.

- Sentiment analysis.

- Anonymization.

These services are prone to bias if representative data sets are not used.

AI systems learn to make decisions based on data, so it is essential that the data sets used to train the algorithms are developed in a controlled and responsible manner.

For this reason, our qualified employees manually label data to be used for several tasks.

Biased data implies a biased algorithm, and thus unfair or discriminatory results.

In summary, it is essential to use sufficiently large and representative data sets in order to avoid discrimination such as cultural or racial bias.