Reading TransFG

November 03, 2021

TransFG:

1) verify the effectiveness of vision transformer on fine-grained visual classification which offers an alternative to the dominating CNN backbone with RPN model design

2) naturally focuses on the most discriminative regions of the objects and achieve SOTA performance

3) visualization helps show the ability of capturing discriminative image regions

Methods:

1) vision transformer as feature extractor

image sequentialization: first preprocess the input image into a sequence of flattened patches, generating overlapping patches with sliding window

2) TransFG architecture, propose the Part Selection Module (PSM) and apply contrastive feature learning to enlarge the distance of representations between similar sub-categories

3) contrastive feature learning, minimizes the similarity of classification tokens corresponding to different labels and maximizes the similarity of classification tokens of samples with the same label.

Paper source: https://arxiv.org/pdf/2103.07976.pdf

He, J., Chen, J.N., Liu, S., Kortylewski, A., Yang, C., Bai, Y., Wang, C. and Yuille, A., 2021. TransFG: A Transformer Architecture for Fine-grained Recognition. arXiv preprint arXiv:2103.07976.

Search This Blog

Sophie's Daily Note

Reading TransFG

Comments

Post a Comment

Popular posts from this blog

Reading Very Deep VAE

Reading CLIP

Reading CutPaste