Fall 2011 - CS477/577 - An Introduction to Computer Vision

Assignment Three

Due: Friday, September 30, 2011, late (i.e., before Saturday 9am)

Credit: Approximately 8 points (Relative, and very rough absolute weighting)

This assignment must be done individually


You can do this assignment in any language you like, with either Matlab or C/C++ together with the vision lab support code being the two recommended choices. Python might be up to the task also, but you will have to sort out numerical library support on your own (or as a group).

Those (considering) working with the vision lab should do at least some of the assignments in C/C++.

Information for those working in C/C++.     (Updated for this assignment).


Assignment specification

This assignment has four parts, three of which are required for undergrads.

To simplify things, you should hard code the file names in the version of your program that you hand in.

Part A

Overview.   In part A you will explore fitting lines to data points using two different methods, and consider the difference between them. Recall that when there are more than two points, we do not expect a perfect fit even if the points come from a line because of noise. Fitting the line to many points mitigates the effect of noise (similar to averaging), providing that the noise is well behaved (no outliers). However, we need to be clear about what line we are looking for (i.e., what is the definition of the "best" line). There are a number of possibilities, and we explored two of them in class.

The file

    line_data.txt 
is an ASCII file containing coordinates of points that are assumed to lie on a line. Apply the two kinds of least squares line fitting to the data. You need to implement the fitting yourself based on formulas given in class.

[ There may be Matlab routines that simplify this question (e.g. REGRESS), but you need to use slightly lower level routines to show that you understand how things are computed. However, you may find it interesting to compare with the output of REGRESS or like routines. ]

Your program should fit lines using both methods, and create two plots (+) showing the points and the fitted lines. For each method you should output the slope, the intercept, and the error of the fit under both models (eight numbers total) (+). Comment in the readme file how do you expect that the error under the one model to compare with the error of the line found by the alternative model, and vice versa.

Part B

Overview.   In part B, you will calibrate a camera from an image of a coordinate system. You will use that image to extract the image location of points with known 3D coordinates by clicking on them. The camera calibration method developed in class will then be used to estimate a matrix M that maps points in space to image points in homogeneous coordinates. Having done so, you will be able to take any 3D point expressed in that coordinate system and compute where it would end up in the image. This applies, of course, to the points that you provided for calibration, and the next step is to visualize and compute how well the "prediction" of where the points should appear compares to where they should appear.

(Note that the following parts of this assignment will not work out accurately. You are purposely being asked to work with real, imprecise data that was collected quickly many years ago with an old camera).

Use the first of these images

IMG_0862.jpeg     (tiff version)
IMG_0861.jpeg     (tiff version)
IMG_0863.jpeg     (tiff version)
IMG_0864.jpeg     (tiff version)
IMG_0865.jpeg     (tiff version)

(tiff versions are supplied in case there are problems with the jpeg versions or the compression artifacts are giving you trouble, but note that the tiff versions are BIG!)

to calibrate the camera that took it using at least 15 calibrations points. You may find it easiest to use some image editing program to label the points in order to keep track of them. If you do label your points, provide the labeled image as part of what you hand in. If you find the time to do more than one calibration (using a different 15 points and/or second image), you should comment on the agreement or lack thereof of the two results. You should hand in two files, world_coords.txt and image_coords.txt which the grader can use to run your program to get your results (+).

To get the pixel values of the points, you need to either write a Matlab script to get the coordinates of clicked points, or you can use the program kjb_display (PATH on graphics machines is ~kobus/bin/linux_x86_64_c2, MANPATH ~kobus/doc/man) which is a hacked version of the standard, ImageMagick version of "display", or you can use some other software that you may know of. If you use kjb_display, use alt-D to select data mode. Then the coordinates of pixels clicked with the third button are written to standard output.

To set up a world coordinate system, note that the grid lines are 1 inch apart. Also, to be consistent with the light direction given later, the X-axis should be the axis going from the center leftwards, the Y-axis is going from the center rightwards, and the Z-axis is going up. (It is a right handed coordinate system).

Using the points, determine the camera matrix (denoted by M in class) using non-homogeneous least squares. Report the matrix (+).

Finally, compute the squared error between the projected points and where they should have gone (i.e., where you found them by clicking). This is an error estimate corresponding to the projection visualization just discussed.

Is this the same error that the calibration process minimized? Why? Provide an answer and comment on whether is this good or bad in the readme file.

Using M, project the points into the image. This should provide a check on your answer. Provide an image showing the projected points (+).

Warning. It is possible to map the 3D points to the 2D points using a linear transformation which you can find using non-homogeneous least squares. But it is important to realize that the mapping from 3D to 2D is fundamentally non-linear ("divide-by-w" part), and thus a linear mapping is not as good. If you have the correct answer, your matrix norm should be 1, and further, it is arbitrary (you can double it and still get the same image coordinates). On the other hand, if you accidentally use the non-homogeneous method, then the matrix will not have unit norm, and it will matter (doubling it would lead to different image coordinates). You may find it interesting to do both ways and compare, but make sure that you at least hand in the homogeneous least squares solution that we developed in class.

Part C (Required for grad students only --- bonus marks available for undergrads).

Determining the extrinsic/intrinsic parameters

Recall that in class we decided that M is not an arbitrary matrix, but the product of 3 matrices, one that is known, and the other two that have 11 parameters between them. Since there are 11 values available from M, this suggests that we can solve for those parameters. Let's give this a go!

Let's assume that the camera has perpendicular axes, so that we can assume that the skew angle is 90 degrees if needed. Use the equations on page 46 of the text to compute the extrinsic and intrinsic parameters of the camera. If you do not have the text, see the supplementary slides "intrinsic.pdf" .

In particular, you will recover the orientation and location of the camera, which will be used in the next part. Report your estimates (+).

As a further direct check on the results, in this image (jpeg)     (tiff version) the camera was 11.5 inches from the wall. This can be used to compute alpha and beta more directly.

Part D

Computer vision meets graphics

Introduction: One of the consumers of vision technology is graphics. Applications include acquiring models for objects based on images, and blending virtual worlds with image data. This requires understanding the image. For example, if you create a graphics image, the camera location and parameters are supplied by some combination of hard coded constants, and user input. For example, the user may manipulate the camera location using arrow key. In the following part, we have a different situation. We want to use the camera that took the image. But we know how to do this (consult M).

Now the task: Reportedly, the light was (roughly) at coordinates 33, 29, and 44. Ask yourself if this makes sense given the shading of the objects in the images that have objects. We now want to render a sphere into one of the images with one or more objects in it with plausible shading. Using the Lambertian reflectance model, render a sphere in the second image with radius 1/2 inches and located at (3,2,3) using any color you like. In order that this assignment does not rely on having taken graphics, we will accept any dumb algorithm for rendering a sphere. For example, you could model a sphere as:

x = x0 + cos(phi)*cos(theta)*R
y = y0 + cos(phi)*sin(theta)*R
z = z0 + sin(phi)*R
Now step phi from -pi/2 to pi/2 and step theta from 0 to 2*pi to get a bunch of 3D points that you will draw a sphere if projected into the image using the matrix, M. (If your sphere has holes, use a smaller step size).

Note that if you are working in Matlab, then, depending on how dumb your algorithm is, and how you implemented it, it may be slow.

There is one tricky point. We need to refrain from drawing the points that are not visible (because they are on the backside of the sphere). Determining whether a point is visible requires that we know where the camera is. The grad students will compute the location of the camera, but since this is not required of undergraduates, a serviceable estimate is: 9, 14, 11

Assume that the camera is at a point P. For each point on the sphere, X=(x,y,z), the outward normal direction for the point on the sphere can be determined (you should figure out what it is). Call this N(X). To decide if a point is visible, consider the tangent plane to the sphere at the point, and compute whether the camera is on the side of the plane that is outside the sphere. Specifically, if

(P-X).N(X) > 0
then the vector from the point to the camera is less than 90 degrees to the surface normal, and the point is visible.

Part E (Optional, modest extra credit)

What is a calibrated camera good for?

The inside of the box with graph paper plays the role of a calibration object, much like the checker board box used in the class demo. Once the camera is calibrated, we can remove it as it has told us all it "knows". Suppose that the major grid lines are one inch apart, which defines the absolute scale of your clicked 3D points.

Have a look at the second image, and suppose that the graph paper was no longer there. (It would be fun to take new images so that you don't have to pretend, but I cannot arrange this for fall 2011). Assume that the figurines are on the ground plane (z=0). Click on points where the feet contact the ground, and using those points, and your knowledge of the camera, estimate the distance between the feet. Assuming that the characters are human size, you can now claim that these are small figurine facsimiles, not real characters.

Part F (Optional, modest extra credit)

More fun with graphics

A) Add a specularity on the sphere.

B) Render the shadow of the sphere.

Note that these problems are not completely trivial For (A), you should develop equations for where you expect the specularity to be. You will need to consider the light position, the sphere, and the camera location. You may find it easiest to write the point on a sphere, X = X0 + R*n, where n is the normal which varies over the sphere.


What to Hand In

Hand in your code to produce the various required outputs such as line fitting numbers, plots of the fits, determining M and camera parameters (grad), an image showing the re-projected points, the re-projection error. Because the grader will run your code, you need to hand in the data file with your selected points in it. In addition, record your estimates of M and the camera parameters (grad) in your README.

You should hand in code for adding a sphere to an image under the given camera model. You should also provide a copy of the image with an added rendered sphere. Don't forget to tell us where and how big the sphere is supposed to be.

If you are working in Matlab: You should provide a Matlab program named hw3.m, as well any additional dot m files if you choose to break up the problem into multiple files.

If you are working in C/C++ or any other compiled language: You should provide a Makefile that builds a program named hw3, as well as the code. The grader will type:

    make
    ./hw3
You can also hand in hw3-pre-compiled which is an executable pre-built version that can be consulted if there are problems with make. However, note that the grader has limited time to figure out what is broken with your build.

If you are working in any other interpreted/scripting language: Hand in a script named hw3 and any supporting files. The grader will type:

    ./hw3
To hand in the above, use the turnin program available on lectura (turnin key is cs477_hw3).