Fall 2011 - CS477/577 - An Introduction to Computer Vision

Assignment Four
Due: Friday, October 21, 2011 (by 9am Saturday).
Credit (U-grad): Approximately 8 points (Relative, and very rough absolute weighting)
Credit (Grad): Approximately 8 points (Relative, and very rough absolute weighting)

This assignment must be done individually

Language choice

You can do this assignment in any language you like, with either Matlab or C/C++ together with the vision lab support code being the two recommended choices. Python might be up to the task also, but you will have to sort out numerical library support on your own (or as a group). Regardless of the language you choose, please head the "works out of the box" requirement as described in "what to hand in."

It is probably a bit easier in Matlab, but doing some of the assignments in C may prove to be a useful exercise. If you think that vision research or programming may be in your future, you might want to consider doing some of the assignments in C/C++. If you are stretched for time, and like Matlab, you may want to stick to Matlab. Your choice!

Information for those working in C/C++.

Assignment specification

This assignment has five parts, two of which are required for undergrads, and four of which are required for grads. The rest are optional (modest extra credit is available).

To simplify things, you should hard code the file names in the version of your program that you hand in. You can assume that the grader will copy symbolic links of needed file is in your direct when they run your code.

I need a report in PDF as well as code. The PDF can take the place of the readme. In fact, please call it "README.PDF". While the grader may run your code, they should be able to grade the assignment from the PDF alone..

The big picture.

In this assignment you will reinforce your understanding of shape from shading in the simplest case of photometric stereo. The treatment will basically follow the notes.

Subsequent to that, we will experiment with computation color constancy. You will you test out the ability of the two simplest algorithms to remove a blue color cast from images taken under a light that is a bit blue for the camera.

Below you will find images of a synthetic Lambertian surface imaged under seven different known lights that are provided. The goal is to determine the surface normals at each pixel, and then from that, a depth map for the surface.

Part A (Required for both undergrads and grads)

The following seven files

    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-1.tiff  
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-2.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-3.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-4.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-5.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-6.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/4-7.tiff

(smaller versions in case the above are too slow in Matlab)

    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-1.tiff  
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-2.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-3.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-4.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-5.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-6.tiff 
    /cs/www/classes/cs477/fall11/ua_cs_only/assignments/small/4-7.tiff

are seven images taken of a Lambertian surface with the light at seven different directions. Those directions are (in order):

      0.44721360   0.00000000   0.89442719
      0.27872325   0.34950790   0.89451528
     -0.09788109   0.42884510   0.89805967
     -0.37613011   0.18113471   0.90868936
     -0.35093186  -0.16899988   0.92102436
     -0.09599213  -0.42056900   0.90216807
      0.26886278  -0.33714326   0.90224566

You can assume that the projection is orthographic, with the z axis being normal to the image plane. You may recall that this means that the point (x,y,z) is simply projected to (x,y,0). If we ignore issues of rotation and units, this is like an aerial photograph where the points are far enough away that the relief does not matter.

A possible point of confusion regarding data formats is that images are typically indexed by row (increases in the direction that you normally think of as negative Y), and column, (increases in the direction that you normally think of as positive X). It is best to keep this convention for this assignment. If you use a different one, such as using X as the horizontal axis, you will get a different solution that is more difficult to interpret.

Now the meat:

Use photometric stereo to compute the surface normals at each point of the image. Notice that the surface has different albedo in the four quadrants. The normals that you compute must be independent of this. Demonstrate that you have good values for the normals by creating an image of the surface as if it had uniform albedo, and though it was illuminated by a light in the direction of the camera (i.e., in direction (0, 0, 1). You can assume that the albedo/light combination is such that a surface that is perpendicular to the camera direction (i.e., the normal is in the direction (0, 0, 1) has pixel value (250, 250, 250).

Submit code to produce this image, and put the image into the README.PDF as well (+).

Part B (Required for grad students only).

Now compute a depth map of the surface, and make a 3D plot the surface z=f(x,y). Do this by integrating the partial derivatives along a path. You can specify any point you like to be the reference "sea-level" point with z=0.

For those working in C/C++, I have provided a simple 3D plotting routine. Unfortunately, it is a bit crude, but it is sufficient for this assignment.

Submit code to produce this plot, and also put the plot into your README.PDF (+).

Part C (Optional for both undergrads and grads).

Extend the assignment in some way, and let the grader know what you did in sufficient detail. Hand in what is appropriate. Extensions that come to mind include:

Explore the effect of noise. Add noise to the images in increasing quantities, and see how the surface reconstruction degrades. Does averaging over multiple paths improve the reconstruction? How can you measure how well you are doing?

Explore the effect of assuming a different shading model than Lambertian. For example, what if the brightness was proportional to the square root or square of the dot product.

Create some data for yourself that for some different shapes and surfaces, and try to solve for them.

Attempt to solve the problem without the light directions. This is a bit challenging, and a perfect solution is not required. However, you should explain what you have tried.

Your own creative idea!

Part D (Required for both undergrads and grads).

Background

This link points to gzipped tar file of a directory containing image pairs where one of the images is taken under light and the other image is taken under another light. Hopefully the organization is clear from the file names.

Note that some of these images are very dim, because they were imaged so that specularities (if they exist) did not cause too much pixel clipping. A pixel is clipped if one of the channels would have a value of over 255, and is set to 255. You cannot trust the value of a pixel that is 255, or even close to that. However, the images were also taken in such a way to keep noise low. Hence you can scale them up for visualization, and even computation as long as you stay in a floating point representation.

These images also appear dim because they are linear images (not gamma corrected). This simplifies things for us, but for the kind of question we want to ask of images, it can matter if they are gamma corrected, and we might need to linearize them first. Fortunately for us, this has already been done.

Inspection of the file macbeth_syl-50MR16Q.tif reveals that the "syl-50MR16Q" light (50MR16Q is just the Sylvania light bulb number), is reasonably close to what this camera expects, as the white patch (bottom left) looks neutral, and the R,G,B values are relatively even (not perfect, but good enough for our purposes). We will thus take the "syl-50MR16Q" as our canonical light, and one interpretation of the color constancy problem is to make images of scenes look as though this light was shining on them (instead of some other light).

In color constancy we often do not care about the absolute scale of our estimate of the illumination (R,G,B), or the error in the estimated brightness of the image. We will assume that our application is really about getting the chromaticity correct. To make things easier to grade, report the color of the illumination scaled so that the max value is 250. E.G., convert (125, 100, 75) to (250, 200, 150). We will consider two error measures for color constancy:

Angular error of illuminant estimate. Here consider the illumination color, and the estimate there of, as vectors in 3D space, and compute the angel between them in degrees.
RMS (r,g) mapping error. We can use the illumination estimate to convert an image to a different one which is an estimate of image as if it the scene where illuminated by the canonical light. (Recall that r=R/(R+G+B) and g=G/(R+G+B)). When we do this, we must exclude dark pixels. Because these images were taken carefully, we can use a relatively "dark" definition of dark. Lets exclude pixels where R+G+B is less than 10. (If you need to reduce this threshold, let me know by email, and/or in your readme). The RMS error is square root of the average squared error over all the r and all the g (taken together) except where one of the original images was too dark.

Finally, we are ready to begin!

Problems

1) Average some of the pixels in the white patch in macbeth_syl-50MR16Q.tif, and then scale the result so the max is 250 to provide an estimate of the illuminant color (+).

2) Do the same for macbeth_solux-4100.tif. What is the color of the "solux-4100" light? (+)

3) What is the angular error between the two light colors? (+).

4) Now use the diagonal model to map the second (bluish) image to the one under the canonical light. Display three images: The original (bluish) one, the (hopefully) improved one, and the canonical one for reference. To make things easy to inspect, scale each of these images so that the max value in any channel is 250. (+)

5) Compute the RMS error in (r,g) of the pixels between the original image and the canonical, and the (hopefully) improved image and the canonical.

6) Using the MaxRGB algorithm, estimate the light color in the remaining solux-4100 images, and report the angular errors between this estimate and the solux-4100 light color you measured using the macbeth color chart image. (+).

7) Using the MaxRGB illumination estimate, display image triplets as you did for macbeth: original, corrected image, and canonical (target) image (+). Report the (r,g) RMS error for the mapped images (+). Is there good agreement between the general trend of the two kinds of errors? (+).

8) Finally, repeat the previous two questions using the gray-world method instead of the MaxRGB method (+, +). Which method is working better on this data? (+).

Part E (Required for grads only).

Derive a formula for the best diagonal map between the (R,G,B) under one light and the (R,G,B) under another light using sum of squared errors as your definition of best (+). Based on your derivation, is this diagonal map guaranteed to give a better answer using the (r,g) measure than any algorithm you might invent (+)?

Following the same pattern of investigation as before, provide before, after, and target images for the correction procedure you just derived (+). Provide (r,g) error estimates for this method as well (+).

What to Hand In

A report in a PDF file called README.pdf that has all your results, images, plots, and answers to questions. The grader should be able to grade the assignment from the PDF, without running your code.
Hand in code that computes the required results including displaying the requested images.
Provide a README.txt that reminds me how to run your program and which machine you have tested it on and any similar meta-information such as you did an extra question, or that you did not do a question, or that your program does not work for a particular question --- people get busy and have to skip a question on occasion and it is easier for the grader if this is in your README.txt). However, different than previous assignments, answers to vision questions should be in the report, not in READE.txt.

For your code:

You should provide a Matlab program named hw4.m, as well any additional dot m files if you choose to break up the problem into multiple files.

If you are working in C/C++ or any other compiled language: You should provide a Makefile that builds a program named hw4, as well as the code. The grader will type:

    make
    ./hw4

You can also hand in hw4-pre-compiled which is an executable pre-built version that can be consulted if there are problems with make. However, note that the grader has limited time to figure out what is broken with your build.

If you are working in any other interpreted/scripting language: Hand in a script named hw4 and any supporting files. The grader will type:

    ./hw4

To hand in the above, use the turnin program available on lectura (turnin key is cs477_hw4).