Fall 2011 - CS477/577 - An Introduction to Computer Vision

Assignment Six
Due: Late Tuesday Night, November 15, 2011 (before 9am Wednesday MORNING)
Credit: Approximately 8 points for UGRADS and 6 points for GRADS (Relative, and very rough absolute weighting)

This assignment must be done individually

Big picture

This assignment will provide hands on experience using SIFT features to match images. The features themselves will be extracted using a program provided by David Lowe. However, you are responsible for some understanding of the main design criteria for the feature detector.

You can do this assignment in either Matlab or C/C++.

Information for those working in C/C++. (NOT updated for this assignment).

Assignment specification
(UGRADS/GRADS are responsible for all parts)

The following image pairs are a high quality image of a PowerPoint slide, and a low quality image that is a frame of a video of a presentation that used that slide. The format (PGM) is a gray scale image that the SIFT software knows how to read, as does Matlab.

slide1.pgm frame1.pgm

slide2.pgm frame2.pgm

slide3.pgm frame3.pgm

This directory /cs/www/classes/cs477/fall11/ua_cs_only/assignments/siftDemoV4 contains the contents of a zip file available (just for reference). from http://www.cs.ubc.ca/~lowe/keypoints . This contains an implementation of David Lowe's SIFT keypoint finder. You can click here for the README and here for the executable. Or, more conviently, use:

    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/siftDemoV4/sift

Note that what you need to do for this assignment is quite close to the example code in match.m or match.c, but you are responsible for your own implementation. You should consult with those files only minimally if you are stuck and need some ideas. However, you can make use of the code in "sift.m" or "util.c" to help with the parsing of the output of the program "sift". If you copy code, you need to provide attribution!

For each pair, find the best N disjoint (1-1) matches using the nearest neighbor using the Eucludian distance of the 128 element feature vector. Here disjoint means that each keypoint in one set matches at most one keypoint in the other set, and so N cannot be larger than the minimum of the sizes of the two sets.
Produce a collage of four images, in the following arrangement:
```
    slide_1     slide_1

    frame_1     frame_1 
```
where the left two images should show some of the keypoints used in the N matches, with vectors attached to them showing the scale and the orientation. The right two images should have lines connecting each of the N matched keypoint pairs.
Experimenting a bit with N for each image, and see if you can notice that closer matches tend to be better. Choose a value of N that is a fixed percentage P of the number of candidate matches (P should be the same for each image pair) that provides many good matches at the expense of having some bad matches also. Shoot for roughly 2/3 good matches and 1/3 bad matches. If N for a given pair is too large for clarity, plot every second or third or fourth match. Provide P in your report (+).
Your program should display these images and write them as q1a.jpeg, q1b.jpeg, q1c.jpeg. Put them into your report (+) (with a caption).
Using the same P as above, try measuring the distance between matches as the cosine between the two 128 element feature vectors. Display the collages, write then as q2a.jpeg, q2b.jpeg, q2c.jpeg, and put them into your report (+) (with a caption).
In the paper, Lowe suggests that the ratio of the distance to the nearest neighbor and the second nearest neighbor can be more robust. Try that out using Euclidean distance, again using the same P as before. Display the result, write q3a.jpeg, q3b.jpeg, q3c.jpeg, and put them into your report (+) (with a caption).
Same as the previous question, but for the cosine measure. Display the collages, write q4a.jpeg, q4b.jpeg, q4c.jpeg, and put them into your report (+) (with a caption).
Comment on the effectiveness of the various methods in your report (+).
Using your preferred method, now use the results above to set a threshold, T, for that method that seems like a good choice for accepting a match as good. Now try matching all slides to all frames, and report the number of matches in a 3x3 matrix where the element (i,j) is the number of matches T for slide_i with frame_j. Make you put the confusion matrix into your report (+),

Want to hand in

To hand in the above, use the turnin program available on lectura (turnin key is cs477_hw6).

Hand in any code, and a report in PDF with the collages with captions and the answers to the last two questions. The beginning of your report should contain any "meta" issues such skipped questions, or how to run your code if it is not a Matlab program called hw6.m. It should be possible to grade the assignment based on the report, with running the code needed only to investigate problems or check that the code can create the images in the report.

You can assume that the grader will link the input files and the sift executable to your turnin directory. If you like, you can run the sift executable, put the results into files, hand in those files, and your program can work on those files. Explain this sort of thing in your report.

If you are working in C/C++, please provide an executable called hw6.bin.linux and/or hw6.bin.mac, so that I can run it even if your code does not compile,