R-CNN
Given an image with multiple objects, we generate some ROIs using a proposal method (Selective Search) and wrap the regions into a fixed size.
Then forward each region to CNN(such as AlexNet), which will use an SVM to make a classification decision for each one and predicts a regression for each bounding box.
This prediction comes as a correction of the region proposed, which may be in the right position but not at the exact size and orientation.
Although the model produces good results, it suffers from a main issue.
It is quite slow and computational expensive.
Image that in an average case, we produce 2000 regions, which we need to store in disk, and we forward each of them into the CNN for multiple passes until it is trained.
Comments
Post a Comment