How to implement custom object detection with template matching. No annotated data needed!
Today, state-of-the-art object detection algorithms (algorithms aiming to detect objects in pictures) are using neural networks such as Yolov4.
Template matching is a technique in digital image processing for finding small parts of an image that matches a template image. It is a much simpler solution than a neural network to conduct object detection. In addition, it comes with the following benefits:
- no need to annotate data (a time-consuming and mandatory task to train neural networks)
- bounding boxes are more accurate
- no need for GPU
In my experience, combining a neural network like Yolov4 and object detection with template matching here is a good way to considerably improve your neural network performance!
What is template matching?
When you use OpenCV template matching, your template slides pixel by pixel on your image. For each position, a similarity metric is computed between your template image and the part of the image it recovers:
If the similarity metric is high enough for one pixel, then this pixel is probably the top-left corner of an object matching your template!
Consequently, you can achieve object detection with template matching only if the objects you try to detect are similar enough —almost identical— within a class. You can still include more templates to tackle object variations (size, color, orientation). But it will increase the prediction time.
At first look, it seems very restrictive. But a lot of object detection use cases can be tackled with template matching:
- ID in scanned documents
- empty parking space from a stationary camera
- components on an assembly line...
A Practical Example
A good use case for object detection using template matching is to detect components on printed circuits, such as this one:
We could imagine an assembly line producing such circuits. Let’s imagine that some circuits manufactured are missing components and thus, defective. We could propose to install a camera at the end of the trail and to shoot each circuit, in order to filter out defective products. We can achieve this with object detection with template matching!
For the sake of simplicity, we will focus on the detection of a few components.
A first component appearing twice:
This one appearing four times:
And this third one appearing six times:
Finally, we choose these three images as templates. Consequently, the complexity of this use case is reduced: we will easily detect at least the objects chosen as templates.
Basic object detection with template matching
Firstly, we define templates from:
- an image path,
- a label,
- a color (for result visualization —bounding boxes and labels color),
- and a matching threshold.
Secondly, we consider that all pixels having a similarity metric above this threshold indicate a detection for this template.
Here is the code defining templates:
Detecting object with template matching
Then, we loop over templates to perform object detection with template matching for each template. Because we are using a threshold, we select a normalized similarity metric (TM_CCOEFF_NORMED) when applying template matching. Hence, we can pick a threshold between 0 and 1:
We consider that each pixel having a similarity score above the template threshold is the top-left corner of an object (with the template’s height, width, and label).
Visualize detected objects
Then, we plot the predicted bounding boxes of this object detection with template matching on the input image:
Finally, we obtain the following results:
As indicated by the thickness of boxes (in green, yellow, and red), each object has been detected several times.
Why did we obtain duplicated detections? As explained above, OpenCV template matching returns a 2-D matrix having the dimension of the input image (one cell— and thus one similarity score— for each input image pixel).
Therefore, if an object is detected in one location, all surrounding pixels will most likely have the same similarity score, and thus considered as other object top-left corners.
To tackle this issue, we will sort all detections by decreasing matching values. Then, we will choose whether or not to validate each detection. We validate the detection if it is not overlapping too much with any of the already validated detections. Finally, we determine that two detections are overlapping if the Intersection over Union of their bounding boxes is above a given threshold. This process is called Non-Maximum Suppression.
Here is a visual explanation of what Intersection over Union (IoU) is:
Here is how I implemented it (compute IoU method along with more explanations can be found here):
And then, I just added these two lines after the detection loop:
As a result, we obtain:
Much cleaner! We now clearly see that all first and third components are detected without false positive (precision and recall of 1).
We now want to reduce the number of false positives for component 2. The easiest way is to increase the matching threshold for the template used for this label.
It is of course better to compute object detection metrics on various images to choose hyperparameters (template matching threshold and Non-Maximum Suppression threshold). For now, we can simply increase the threshold for component 2:
And we obtain:
We now have only two false positives for component 2, instead of dozens (precision of 2/3, recall of 1)! In addition, components 1 and 2 are still perfectly detected (precision and recall of 1).
To improve our results we could:
- include templates for the components mistaken with component 2.
- try several similarity metrics
- annotate a few images to compute detection metrics, and perform a grid search on these parameters.
We have achieved object detection with template matching by:
- defining at least one template for each object (the more templates you have for one object the more your recall will be high—and your precision low)
- using OpenCV template matching method on the image for each template
- considering that each pixel having a similarity score above a template threshold is the top-left corner of an object (with this template’s height, width, and label)
- applying Non-Maximum Suppression of the detections obtained
- choosing template thresholds to improve detection accuracy!
Are you looking for Image Recognition Experts? Don't hesitate to contact us!