Zero-Shot Learning in NLP

Introduction

In the supervised learning paradigm, labeled data sets are an essential resource when it comes to training artificial intelligence to find patterns in the labeled data.

However, the reality is that most of the data around us are unlabeled (or unclassified) and, moreover, the labeling process is resource-intensive, time-consuming, and costly.

In order to solve this problem, a technique called zero-shot learning (ZSL) was devised. It has been used in the field of computer vision and has had a strong impact in recent years in the field of natural language processing (NLP).

What is zero-shot learning?

Zero-shot learning is a problem setup in machine learning in which, at the time of testing, a learner observes samples from classes that were not observed during training and needs to predict the class to which they belong.

Recently, especially in NLP, it is being used increasingly often to get a model to do something it has not been explicitly trained to do. A well-known example of this is the GPT-2 language model, which is employed in tasks such as machine translation without prior fine-tuning in these roles.

Types of zero-shot learning

The ZSL problem can be divided into categories based on the data used during the training phase and the test phase.

Data present in the training phase:
- Inductive: The main objective in this scenario is to transfer semantic knowledge to the visual image so that the model can recognize objects of classes not seen at the time of testing.
- Transductive: This setting is useful in practical situations where we have access to a large number of images, but where labeling or annotating each image is not feasible or requires a lot of work. Compared to the inductive configuration, the transductive method is a bit easier, since the model has some knowledge about the distribution of the visual features of the unseen classes.
Data present during the test phase:
- Conventional: This configuration is, from a practical perspective, not as useful, since, in realistic scenarios, the assumption that the data at test time come only from unseen classes is difficult to guarantee.
- Generalized: From a practical perspective, this is more useful, realistic, and much more difficult than the conventional configuration. The reason is that the model has been trained only with data from seen classes and, therefore, its predictions are biased towards the classes it has observed during training. This leads to a lot of unseen classes of data being misclassified into seen classes at test time, which drastically reduces performance.

How zero-shot learning is applied at Pangeanic

As mentioned above, obtaining large amounts of high-quality labeled data is difficult. For this reason, applying zero-shot learning allows us to reduce our models' dependence on labeled data. This way of evaluating models to see how they behave with data they have not seen during training is interesting and provides more realistic evaluations by forcing it to find patterns it has not seen before, i.e., data it did not encounter during training.

At Pangeanic, we experiment with different shot approaches, and we are always aware of new technologies and experiments that are being carried out in order to improve our models’ output.

We are a natural language processing company specialized in anonymization software, near-human quality private machine translation, automatic data classification, relevance and sentiment analysis, and summarization. We combine AI with human creativity to offer the best technological solutions.