Spring 2008 - CS477/577 - An Introduction to Computer Vision

Assignment Six

Due: Late Friday Night, April 11, 2008

Credit: Approximately 8 points for UGRADS and 6 points for GRADS (Relative, and very rough absolute weighting)

This assignment must be done individually


You can do this assignment in either Matlab or C/C++.

Information for those working in C/C++.     (NOT updated for this assignment).


Assignment specification

(UGRADS/GRADS are responsible for all parts)

The following image pairs are a high quality image of a PowerPoint slide, and a low quality image that is a frame of a video of a presentation that used that slide.

slide1.pgm         frame1.pgm

slide2.pgm         frame2.pgm

slide3.pgm         frame3.pgm

This directory /cs/www/classes/cs477/spring08/ua_cs_only/assignments/siftDemoV4 contains the contents of a zip file available (just for reference). from http://www.cs.ubc.ca/~lowe/keypoints This contains an implementation of David Lowe's SIFT keypoint finder. You can click here for the README and here for the executable. Or, more conviently, use:

    /cs/www/classes/cs477/spring08/ua_cs_only/assignments/siftDemoV4/sift
Note that what you need to do for this assignment is quite close to the example code in match.m or match.c, but you are responsible for your own implementation. You should consult with those files only minimally if you are stuck and need some ideas. However, you can make use of the code in "sift.m" or "sift.c" to help with the parsing of the output of the program "sift". If you copy code, you need to provide attribution!

  1. For each pair, find the best N matches using the nearest neighbor using the Eucludian distance of the 128 element feature vector. Produce a collage of four images, in the following arrangement:
        slide_1     slide_1
    
        frame_1     frame_1 
    
    where the left two images should show some of the keypoints used in the N matches, with vectors attached to them showing the scale and the orientation. The right two images should have lines connecting each of the N matched keypoint pairs. Call your 3 collages q1a.jpeg, q1b.jpeg, q1c.jpeg.

    Begin by experimenting a bit with N, and see if you can notice that closer matches tend to be better. Choose a value of N that provides many good matches at the expense of having some bad matches also. If your N is too large for clarity, plot every second or third or fourth match. Provide N and any such stepping in your README.

  2. The same thing as the above, except try measuring the distance between matches as the cosine between the two 128 element feature vectors. Create q2a.jpeg, q2b.jpeg, q2c.jpeg.

  3. In the paper, Lowe suggests that the ratio of the distance to the nearest neighbor and the second nearest neighbor can be more robust. Try that out using Euclidean distance. Create q3a.jpeg, q3b.jpeg, q3c.jpeg.

  4. Same as the previous, but for the cosine measure. Create q4a.jpeg, q4b.jpeg, q4c.jpeg.

  5. Comment on the effectiveness of the various methods.

  6. Using your preferred method, now use the results above to set a threshold, T, for that method that seems like a good choice for accepting a match as good. This should correspond with getting N matches above. Now try matching all slides to all frames, and report the number of matches in a 3x3 "confusion matrix" where the element (i,j) is the number of matches T for slide_i with frame_j.


To hand in the above, use the turnin program available on lectura (turnin key is cs477_hw6). Hand in any code, the jpeg images mentioned, and a short README.txt with any information that you want to share with the grader including the answers to the last two questions.