Pinar Duygulu, Kobus Barnard, Nando de Freitas, and David Forsyth, Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary, Seventh European Conference on Computer Vision, pp IV:97-112, 2002
(The appropriate archival reference for this data).
(Gzipped tar ball with README file)        
(The following description is contained in the README file)
This directory contains the data used for the ECCV 2002 paper, "Object Recognition as Machine Translation", by Pinar Duygulu, Kobus Barnard, Nando do Freitas, and David Forsyth. The data is very much cobbled together and has some anomalies. More carefully prepared data will be made available in the future.
Each image segment is represented by 36 features. Since each image has a different number of segments, we list the number of segments used in separate files, so that the entire set of image segments can be read into a single Matlab file.
To compute the color features the images were linearized on the basis that they were PCD images, and then for convenience they were scaled up by (255/107), a somewhat arbitrary factor which has some justification based on the PCD format. (In hindsight, a factor of 2 would make more sense, but using this, or any other factor, would not change anything). Note that the features are redundant. Note also that the RGB and L*a*b features were duplicated to increase their weight for a specific experiment (long since finished), and we did not subsequently remove the duplicated columns. I do not know if this duplication inadvertently helps, hinders, or has no effect on the ECCV experiments. However, if you need a non-singular feature matrix, you will have to remove them. The 36 features are:
area, x, y, boundary_len^2/area, convexity, moment-of-inertia (6)
ave RGB (3)
ave RGB (3, yes, duplicated!)
RGB stdev (3)
ave L*a*b (3)
ave L*a*b (3, yes, duplicated!)
lab stdev (3)
mean oriented energy, 30 degree increments (12)
The files are as follows.
words
The vocabulary used. We count the words starting at 1, so "city" is
word 1.
blob_counts
test_1_blob_counts
One number per line for the 4500 training / 500 test_1 images,
giving the number of blobs used for that image.
blobs
test_1_blobs
The features for the blobs for the 4500 training / 500 test_1
images, listed in order of images, then decreasing blob size. In
order to tell which blob goes with which image, you need either the
file blob_counts, or the file document_blobs.
document_blobs
test_1_document_blobs
(EDITED april 4, 2004: The original writing suggested
that these files supplied the blob tokens. However, these files
simply point to the actual blobs. To get the tokens that were used
for the ECCV 2002 paper, consult the files cluster_membership and
test_1_cluster_membership.)
The blob for each of the 4500 training / 500 test_1 images. Each
line has a list of numbers representing indicies into the file
"blobs". If the image has fewer blobs than the maximum, the row is
padded with -99's so that the file can be read as a Matlab matrix.
(The names of these files are somewhat misleading because they are
not exactly analogous with the files document_words and
test_1_document_words. These files do not give you any more
information than what is available in blob_counts and
test_1_blob_counts.)
cluster_membership
test_1_cluster_membership
The blob token associated with each line of the file blobs and
test_1_blobs.
document_words
test_1_document_words
The words for each of the 4500 training / 500 test_1 images. Each
line has a list of numbers which are indicies into the vocabulary
file "words". Counting starts at 1. If the image has fewer blobs
than the maximum, the row is padded with -99's so that the file can
be read as a Matlab matrix.
word_counts
test_1_word_counts
The number of words for each of the 4500 training / 500 test_1
images.
image_nums
test_1_image_nums
The Corel image number for the 4500 training / 500 test_1 images.
We are unable to distribute the actual images due to copyright
restrictions. The data can be used with some extent without the
images. We provide the image numbers for those who have access to
the Corel images.
Segmentation masks.
The directory seg_masks contains files of the form
m_[corel_num]_region_map.mat.gz
These files are Matlab integer matrices which give the region number for
each pixel in the image. We use 0 for unassigned. Since the segmentation
software we used stripped the outer 10 pixels for each image, these are
always 0.
IMPORTANT NOTE: The order of the regions is arbitrary. In the data sets,
the regions are ordered by size. In the segmentation masks, the regions
are in arbitrary order. There may be more regions in the segmentation
masks, because we only used the 10 largest regions for the data. The
main purpose of the masks is so that those who have the images can use
the same regions but different features. (Since we used very simple
features, and ignored the surrounding regions, we are confident that
there is much scope for improvement in these directions.)
(Gzipped tar ball with README file)