Reading CLIP

August 24, 2021

Learning directly from raw text about images is promising which leverages a much broader source of supervision.

Task: predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of pairs.

After pre-training, natural language is used to is used to reference learned visual concepts enabling zero-shot transfer of the model to downstream tasks.

[Paper Source]

https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language.pdf

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.

Search This Blog

Sophie's Daily Note

Reading CLIP

Comments

Post a Comment

Popular posts from this blog

Reading Very Deep VAE

Python Notes

Java Learning