Semantic Segmentation

Adapt from https://www.jeremyjordan.me/semantic-segmentation/


The goal of semantic image segmentation is to label each pixel of an image with a corresponding class of what is being represented. Because we're predicting for each pixel in the image, this task is commonly referred to as dense prediction.

Earlier layers tend to learn low-level concepts while later layers develop more high-level feature mappings.

Drozdzal et al.  swap out the basic stacked convolution blocks in favor of residual blocks. This residual block introduces short skip connections alongside the existing long skip connections between the corresponding feature maps of encoder and decoder modules found in the standard U-Net structure. The report that the short skip connections allow for faster convergence when training and allow for deeper models to be trained.

Expanding on this, Jegou et al. proposed the use of dense blocks, still following  a U-Net structure,  arguing that the "characteristics of DenseNets make them a very good fit for semantic segmentation as they naturally induce skip connections and multi-scale supervision." These dense blocks are useful as they carry low level features from previous layers directly alongside higher level features from more recent layers, allowing for highly efficient feature reuse.


One benefit of downsampling a feature map is that it broadens the receptive field for the following filter but reduces the spatial resolution.

Dilated convolutions provide alternative approach towards gaining a wide field of view while preserving the full spatial dimension. 

However, it is often too computationally expensive to completely replace pooling layers with dilated convolutions.


Cross entropy loss will be a problem when various classes have unbalanced representation in the image.  


Comments

Popular posts from this blog

Reading CLIP

Reading CutPaste

OOD-related papers