Reading CLIP

Learning directly from raw text about images is promising which leverages a much broader source of supervision.

Task: predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of pairs.

After pre-training, natural language is used to is used to reference learned visual concepts enabling zero-shot transfer of the model to downstream tasks.


 [Paper Source] 

https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language.pdf

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.

Comments

Popular posts from this blog

Reading CutPaste

OOD-related papers