We manually groundtruthed the position on the image plane for seven object classes: beds, cabinets, couches, tables, doors, windows, picture frames. For each image, we prepared a mask where we set to white the pixels that are part of the object of interest, and to black the others. We followed the following naming convention: __.jpg For example for the first table in image 180, you can find 180_table_0.jpg If there is more than a table in that image, you will find 180_table_0.jpg 180_table_1.jpg and so on. In the case there is no object for a given category in a given image, you will not find any mask for that category in the archive. Notice also that when we prepared the masks, we pretended that the objects were not occluded, that is to say also pixels belonging to the object of interest that are occluded by some other object will still be white. We also did not include that occupy less than 1% of the image plane. During evaluation, we compare the predicted position of our model with the ground truth position. If the intersection of the two areas is greater than 50% of their union, we consider our prediction a correct detection. Masks in this archive were used for the experiments in our 2012 CVPR paper [1]. [1] "Bayesian geometric modeling of indoor scenes", Luca Del Pero, Joshua Bowdish, Daniel Fried, Bonnie Kermgard, Emily Hartley and Kobus Barnard, CVPR 2012