Loss Function for Object Detection Regression

In object detection, loss = classification loss + bounding box regression loss

Bounding box regression loss development:

Smooth L1 loss-->IoU loss --> GIoU Loss --> DIoU Loss -->CIoU Loss

1. Smooth L1 loss (proposed in Fast RCNN):

x: difference of predicted bounding box and the ground truth;

L1 = |x|dL2(x)/x = 2x
L2 = x^2

smoothL1(x) = 0.5x^2 if |x|<1
                      = |x|-0.5  otherwise

Derivatives of the above three loss function:

dL1(x)/x = 1 if x >= 0
               = -1 otherwise

dL2(x)/x = 2x

dsmoothL1(x) = x if |x| < 1
                        = +-1  otherwise

From above, the derivative of L1 loss is constant, when x becomes small in late training and learning rate remains same, the loss function will fluctuate around certain value and cannot converge to more precise results.

L2 loss function becomes very large when x is large, not stable in early-training.

smoothL1 loss is good among them.


Weakness:

1) Loss functions above assume the corner points of the predicted bounding box are independent, which are actually correlated.

2) The evaluation of object detection is IoU, which is different from  coordinates' prediction. Different bounding boxes may have same smoothL1 loss, but IoU value may vary a lot.

3) Loss function with L1 and L2 are not scale-invariant. 


2. IoU loss(2016 ACM 旷视科技 https://arxiv.org/pdf/1608.01471.pdf):


The green box in Figure1 is ground truth bounding box, the blue box is prediction bounding box, IoU loss: first compute the IoU of two boxes, then get value of -ln(IoU), in practice, IoU Loss = 1-IoU.


Weakness:

1) When predicted bounding boxes and ground truth bounding boxes are not intersected, IoU(A,B)=0 could not tell the distance between bounding box A and box B. In this case, we can not get the derivative of loss function and thus fail to optimize this case;

2) When the predicted bounding boxes and ground truth bounding boxes have fixed size, and in the case that they have same IoU value, we could not tell the intersection way from this information;




Attributes of GIoU:

1)  Lgiou = 1-GIoU
2)  GIoU is scale-invariant
3) Any A, B, GIoU(A, B) <= IoU(A, B) and 0<= IoU(A, B) <=1,
so -1<= GIoU(A, B) <= 1 and When A->B, GIoU and IoU ->1
4) When A and B are not intersected, GIoU(A, B) = -1

Weakness:
When the predicted bounding boxes are in ground truth boxes, IoU = GIoU.



IoU,  Center point distance , Aspect ratio should be taken into account for a good bounding box regression loss.

DIoU converges faster than GIoU loss with IoU and center point distance.

Distance IoU loss:

Usually IoU-based loss can be defined as 


R(B, Bgt) as penalty term for predicted bounding box B and the target box Bgt.

In DIoU, 
 
b and bgt are the center of B and Bgt. 

Besides, DIoU can replace IoU in NMS algorithm, DIoU-NMS!

Attributes of DIoU:

1) DIoU is scale-invariant
2) When two boxes are completely overlapped with each other, Liou = Lgiou = Ldiou = 0 and when they are not intersected, Lgiou = Ldiou -->2
3) DIoU loss is much faster than GIoU loss
4) For the inside case, DIoU loss is fast and GIoU is same as IoU loss in this situation.

5. CIoU loss(Complete-IoU loss):
  
CIoU takes v as a parameter of aspect ratio.







Comments

Popular posts from this blog

Reading CLIP

Reading CutPaste

OOD-related papers