Few-Shot Learning is an area where models rely on only a few examples of new classes to accurately classify new ones. If you don’t already know what FSL is and what support sets and query sets are, take a look at this introductory article by Etienne, a PhD at Sicara. You can also watch this video series in French.
In real-world setups, elements of the support set are usually labeled by human operators, which is prone to errors. We asked ourselves if there is a simple way to automatically detect such outliers in a support set... and there actually is!
But first: why is it so important to detect and remove outliers in the support set?
What outliers and why do they matter?
Here are some examples of what we mean by outliers in the support set. These images have been added by users of the model. Probably because of some manipulation error, elements from other classes found themselves in these two support set classes.
Our experience with Few-Shot Learning in production shows that such outliers seriously undermine the model’s live performance. It also hurts user trust, if they are not aware of the cause.
A major source of underperformance
As shown in the experiment below, the presence of mislabeled elements in a Few-Shot Learning support set quickly causes performance to drop from about 75% accuracy on MiniImageNet to about 50% with 6 mislabeled elements in a support set containing 25 images in total split across 5 classes. That is a drop by a third for about 20% outliers.
Errors in the support set can be a major source of underperformance for the model. It needs to be addressed, to ensure that the theoretical ability of a Few-Shot Learning classifier is preserved in live situations.
Method: how to automatically find outliers?
A classical and commonly used Few-Shot Learning model architecture is Prototypical Networks or ProtoNets. Such networks are trained to project the image input into a vector space where same-class elements are close to each other, and elements from different classes are far from each other.
A natural idea is to apply classical outlier detection algorithms in that vector space, as shown below:
As you can see, we typically use a ResNet backbone. A backbone is the part of the model in charge of representing the input as a vector. A Prototypical Network uses these representations in the vector space to classify elements of the query set.
Which outlier detection algorithms?
After testing out many existing outlier detection algorithms, I found that Isolation Forest and K-Nearest-Neighbours work best by a wide margin.
- Isolation Forest is an algorithm where a decision tree is fitted on the data and items are assigned an outlier score based on how easy it is to separate them from the rest in this decision tree.
- K-Nearest-Neighbours (KNN) is an algorithm that computes the average of the distances to the K closest neighbors to each point. This average constitutes the outlier score of each element.
Experiments to evaluate this method
In order to evaluate the above method, I put in place a reproducible pipeline shown below where elements are randomly drawn from test set classes (which means that the used model backbone has never seen them).
Experiments are conducted by randomly drawing 2 distinct classes, one being the inlier class, and the other being the outlier class. Then a fixed number of elements is drawn from each class, keeping the outlier proportion constant.
The feature vectors are computed and outlier scores are assigned to each element from the sample. We then use AuROC (Area under ROC curve) and Precision at 80% Recall as our main metrics (i.e. how precise the method is when it detects 80% of all outliers).
I used this evaluation method to evaluate my method on 2 main datasets, which are widely used in Few-Shot Learning:
- CU-Birds: a dataset of bird photographs containing 6,033 images split across 200 classes.
- Mini-ImageNet: is a subset of ImageNet containing 100 classes and 600 images in each class.
Below are my results on samples drawn from classes outside of the backbone’s training set. 5-shot experiments are on samples containing 4 inliers and 1 outlier, whereas 10-shot experiments are on samples containing 9 inliers and 1 outlier.
On CU-Birds, I achieve more than 90% AuROC for 5-shot and 95% for 10-shot. On a similar note, I get more than 60% Precision with 80% Recall for both. Isolation Forest has better AuROC while KNN seems to have an edge according to Precision at 80% Recall.
On Mini-ImageNet, performance is similar, although a bit lower. AuROC stays relatively high, although we only get 54% precision for 5-shot and 35% for 10-shot when having 80% recall.
An easy way to boost your Few-Shot Learning classifier’s live performance
A live situation where Few-Shot Learning is used is analogous to a 5-shot setup. If we rely on this method’s performance with CU-Birds, then you can theoretically eliminate 80% of outliers in your support set, by only having to go through about 35% of false positives.
We saw that 20% of outliers in the support set can cause a model’s performance to drop by a third. Using this method, you can go from 20% of outliers to <5%, which should almost get you back to your model’s theoretical performance with a clean support set.
The cost paid to eliminate these outliers being so small using such a simple method, it could become common practice to systematically apply outlier detection when deploying Few-Shot Learning models to production.