Kobus Barnard, Pinar Duygulu, Nando de Freitas, and David Forsyth, "Object Recognition as Machine Translation - Part 2: Exploiting Image Database Clustering Models", unpublished manuscript.
We treat object recognition as a process of attaching words to images and image regions. To accomplish this we exploit clustering methods which learn the joint statistics of words and image regions. We show how these models can then be used to attach words to images outside the training set. This "auto-annotation" process has applications such as image indexing, as well as being related to object recognition. Predicted words can be compared to actual words associated with images in a held out set, and we introduce several performance measures based on this observation. These measures are then used to make principled comparisons of model variants, and proposed enhancements.
Word prediction is most simply done as a function of the entire image. However, for recognition we need to learn the correspondence between words and specific image regions. Here we first show that the existing mod-els can be used for this purpose, and then we propose modifications to im-prove performance based on this goal. Finally, we propose word prediction performance as a segmentation measure and report the results for two segmentation approaches.
Keywords: Object recognition, segmentation, correspondence
Full text (pdf)