Kobus Barnard and David Forsyth, "Learning the Semantics of Words and Pictures", International Conference on Computer Vision, vol 2, pp. 408-415, 2001.
We present a statistical model for organizing image collections which integrates semantic information provided by associated text and visual information provided by image features. The model is very promising for information retrieval tasks such as database browsing and searching for images based on text and/or image features. Furthermore, since the model learns relationships between text and image features, it can be used for novel applications such as associating words with pictures, and unsupervised learning for object recognition.
Keywords: image features, text and images, image semantics, learning, statistical models, latent semantic analysis
Full text (gzipped postscript)