CS477/477H/577 projects

This course requires a substantive project. Especially in the case of graduate students, I strongly prefer research oriented projects. By this I mean projects which support some research effort, whether it is your own, the vision lab's, or someone else's.

I also prefer that undergraduate projects are also research oriented, but in this case there will be more scope for other options. Undergraduate projects do not need to be as substantive as graduate one.

Below I have begun a list of project suggestions. I will update this list frequently during the first week of February. Some are marked as more suitable for undergraduate students and some are more suitable for graduate students or undergraduate students who wish to commit significant time to the project.

It is not necessary to choose a project from the list. They are only suggestions.

All project plans need to be finalized with me.

Group projects are fine in general, but will be considered in the context of specific projects and in the context of the plan for distributing the work.

Because the projects are meant to be research oriented, it will be assumed that others have your blessing to use what you have developed once you are done with it. If, for any reason, this is not the case, then you need to let me know in advance. This will exclude you from doing many (if not all) the suggested research projects, as these are mostly parts of larger projects. Thus, if you have any reservations about releasing your work, you should work with me to find a suitable project. Other than restricting the choice of projects somewhat, there is no consequence for choosing (in advance) not to release your work, and you do not need to give a reason for your choice.

Scope: I am willing to consider a wide range of projects related to vision including some which involve graphics, bio-informatics, and image data bases.

Publication potential: I have noted for some projects a possible publication target. This should not be taken to imply that those projects are the only ones that could lead to a paper. Most, if not all, have that potential. However, publication likely requires someone (possibly you) continues the work after the course is over (this is not required!).

Timeline

Feb 17: By this time you have discussed your project with me, at least by E-mail. You have a good idea of what you want to do. If you have chosen a project that could be held up due circumstances beyond your control (e.g. for some great projects we are still sorting out the data), then you will have contemplated a second choice.

Feb 24: By this time you have chosen your partners and a project. In anticipation of the formal project proposal due the next week, you will send me a draft proposal.

Mar 3: Presentation of project proposal in class (need not exceed 5 minutes). Formal project proposal is also due. Depending on what you are working on and how familiar I am with it, this can be short. You will, however, need to give some idea about what you plan to do for each of the reviews, as well as some specification of what each participant in group projects will be doing.

March 29: First project review due.

What you need to provide is arguably project dependent. However, I suggest approaching this in the following way. Your project has an implementation component and a written component. The written compoenent could be viewed as a draft of a paper. The project reviews could be thought of as interations on a draft for the written part. Your proposal was the first draft. On each iteration, you go from writing what you were going to do, to writing what you have done. For many projects, images are appropriate, and are an exellent way to help tell the story.
I have made turnin keys cs477_pr1 and cs477_pr2 for project review materials. Formats: PDF is fine, or HTML, or even text. Images are good, either in the PDF document, pointed to by the HTML, or just in the directory, but referred to in your text.
It may help you to think what the purpose of the reviews are. First, they should convince me that you are on track. Second, they should help you clarify your thinking. Third, for some of you, some of what you write can form early drafts on parts of papers, where you can get feedback when it is most effective (early one). So try to make it a useful exercise instead of yet another hoop to jump through.

April 13: Second project review.

See first project review for details. The second project review is like the first one except that there should be even more substance.

May 13, 11 (if necessary), week of the 4 (by special request): Project demos/presentations:

I have identified two possible slots for project demos/presentations: May 11 in the afternoon, and May 13, after the exam (i.e. around 4:15 onwards). The slot after the exam worked fine last term for the graphics class--we followed it by a session in the pub on the instructor. However, we can spread it over two time slots, or use May 11 only if that is what everyone prefers. We will negotiate the final distribution over the time slots closer to the time.
If someone wants to present their project during the week of the 4th, this is possible by special request.

May 13: Final code and write-ups due (hard deadline) .

Sample projects

Tracking people in the dark (could support several sub-projects)

(Ongoing vision lab work. Kevin Wampler is in charge)

For virtual reality applications we need to have camera systems which know the configuration of the people in the space (e.g. which way you are looking, what direction you are pointing) from camera data. For artistic VR, we need to do this in the dark. We are exploring 2 ways to do this. First, use 4 infrared cameras and 4 infrared sources. Second, work with the light from the VR screen (you know what it is--this may help). Both approaches could be combined. In addition to working out the geometry, models of humans have to be fit to the data. Finally the temporal change of the configuration--gesture recognition should be attempted. There are a number of ways that this project could be simplified and/or broken up.

Participants in this project should be prepared to join regular meetings on this topic.

Another cool VR project

Hong Hua (faculty in optical sciences) also does cool VR stuff. She focuses on collaborative environments and combining virtual reality with physical objects. She also would like to use camera data to get information about what is going on in the (real) VR space. She will present at the vision group meeting February 3. Some additional specifications are available as a word file.

Learning color semantics (WordNet Color Project) (single person undergrad project)

(Currently under consideration by Abin)

An undergraduate student (TszYan Chow) is working with faculty member Sandiway Fong on understanding color names as it relates to the WordNet ontology. She is linking WordNet words and images found with the Google API to try to understand the semantics of color words. The idea is to help them ground color word meanings, and automatically determine the linking between standard color space data (such as color histograms), and the color words in these images. Your role would be to join this project, help with the image processing, and in general, deal with the vision side of the equation as needed.

Manipulating soft (DXA) X-ray data and linking it with MRI data

(This project requires confirmation on data availability which should come very soon)

This project is the exploratory analysis for a big and very important project. Depending on what aspects are undertaken, several people could work on this project. Even with several people working on it, completion of the entire project is not likely to be possible. I will describe the general thrust of the project.

The main idea is to use DXA data to measure muscle mass loss in elderly women. The strategy is to warp/align DXA data to MRI data (the standard), and then build a classifier or a statistical model so that muscle mass can be estimated using inexpensive/portable DXA imaging instead of expensive MRI imaging.

The first concrete task likely will be creative hacking to reverse engineer the data format produced by the DXA machine (and/or dealing with ancient code and docs). Having done that, image processing could be applied for a variety of tasks such as identifying the skeleton. Then we will want to align multiple DXA images (same person different times) as best as possible so differences can be tracked over time. Possibly having done that, the data needs to be warped/aligned/mapped to the MRI data. Then a classifier on the DXA data needs to be trained using the MRI data.

Understanding shadow statistics (a color project)

This project is ambitious for one term. Again I will lay out the whole project from which a significant start could be planned. If sufficient progress was made by late March, a paper could be submitted to the color imaging conference which has a deadline of April 6.

Have a look at this paper and notice that it has lots of nice ideas but is also one hack after another. The goal of this project is to start to deal with the hacks.

The first part of the project is to begin to build some statistical understanding of shadow information. I recommend looking at this data set of high dynamic range images, and possibly this (NOT LINKED BUT I CAN GET IT) data set of outdoor images. The first task will be to develop a tool to identify boundaries which are illuminant changes (not material changes) in images and to apply it to some of the images above to collect this data about them. Having done that, you need to develop a program to produce statistics about shadows.
The next part is to further develop methods to estimate the probability that an image boundary is an illumination boundary using the data collected as training data. (A literature search here is a good idea).
The next part is to fit the illumination field. Instead of least squares, a sampling approach seems to be appropriate. Several references would have to be read, and possibly implemented to compare against.
The final part is to propose new models for the task. This paper might be relevant, as are several others.

Vision software for the Aerial Robotics Project (slightly geared towards undergrad, but I can be flexible)

The UA aerial robotics project needs your help! (I will skip information about the project itself due to the fact that it was presented in class). I will also get some a more detailed picture on the status of the various components soon. A teaser of several project possibilities:

The project needs to identify a known symbol. Current software based on template matching is slow. SIFT features (code available) can identify exact known objects very fast. The goal is to implement a SIFT base system for this task, and see how well it works. The current (slow) version could be used to find the symbol for testing/training data, and/or could be used to verify answers (if it can be determined that it is more accurate (perhaps after some patching).
Nikhil Shirahatti has worked on this already, and can give some pointers and share some code.
Identifying the difference between an open and closed window. This may take some creativity. Does infrared light help? Polarization? Is there enough reflected light? Can you arrange reflected light. Can a range sensor be made to work (the window is quite far away)?
The group has made good progress identifying building structure from multiple images. Likely more work is needed (I will check).

Non-linear manifolds for image spaces (likely one grad, keen u-grad, or two u-grads, but I could be flexible)

There is much high dimensional data in computer vision. Since the data is not random, it typically lives in a lower dimensional manifold. Often it is assumed that this manifold is linear. However, this assumption is typically wrong, and leads to manifolds which are much too big (in terms of the numbers of dimensions). In this project you will apply methods for finding non-linear methods to interesting data.

The first task is to get the Matlab code on the web for IsoMap and LLE (start with IsoMap). (You may need a login/passwd to get these ("me"/"read4fun"). Implement a C version of both for the vision group software library. Now apply it to some fun problems starting with laying out images for browsers based on data suppled by Kobus, and possibly other measures limited only by your imagination. We have browsing software which uses the layout matrix. Other data of interest is FACS data from the FACS project (assuming that someone does it).

Automatically determining FACS data

A number of groups have published algorithms (and code) for taking human face data and computing the FACS representation of it (with the goal of recognizing expression). The goal of this project is to get a FACS system up and running at UA. Begin by reading some papers and downloading software. See what works. See what needs to be improved. See what needs to be sped up, modularized, etc. Unless an existing system is perfect (only in your dreams), we will need to understand it and its limitations well enough to build on it. It may be necessary to implement some promising algorithms for which there is no code (or unusable code).

Data: I think that there is plenty of test data on the web. However, you can, and perhaps should, take some pictures yourself using our cameras (currently on order, but should arrive soon).

Shape representation (likely challenging, possibly more then one person could work on this independently, taking different approaches)

Understanding 2D shape is one of the big open problems in computer vision. This project is very exploratory. However, there is some possibility of publication next fall.

Get some shape data of common objects (lots of examples of lots of different ones). I intend to order the Hemera set of masked images. They may come with labels "zebra"; you may have to reverse engineer an indexed database to get the labels; you may even have to label some. Alternatively, you may be able to find/suggest other sources and/or manually extract shapes and/or label them. One source worth considering is this one .

Having got multiple examples of a bunch of interesting shapes (e.g., 50 zebras, 30 tigers, 40 dogs, 25 houses), experiment with a variety of shape descriptors and build a classifier for them based on shape. Perhaps something like shape context is worth trying. (You may need a login/passwd to get this paper ("me"/"read4fun")). There are many other possibilities. In the long run the goal is to understand the problem well enough to build a new descriptor/classifier combination which works better then existing ones.

A literature search for both shape descriptors, and anyone recently taking the approach suggested here, is a must.

Texture mapping outdoor photos

The goal of this project is to improve the realism in visualizing a landscape. We have the 3D terrain model given by DEM (Digital Elevation Model) data, but we are lacking a detailed texture map to paint it with. The idea is to take digital photos of mountains (not taken in any specific way, just as people hike or ride around), and figure out how to texture map them on the terrain model. Assume that the exact position of the camera for each photo is known. We do not know the direction the photo was taken in, however.

A first round idea is to set a height for the camera (~5 feet), and spin the view 360 degrees, looking at the resulting skyline from the terrain model. Matching this skyline to the skyline extracted from the image would give us direction the photo was taken in, and some first round approximation of texture coordinates could be extracted.

Scott and Alan Morris have code for constructing and visualizing 3D terrain models, as well as for determining the latitude/longitude of digital photos (this is determined by looking at the time a photo was taken and relating it to a GPS tracklog of a hiker/biker's trip). So, much of the required infrastructure for this project is available.

Blind search for interesting features

The purpose of this project is to treat image patches or simple function of them as features, and look for useful correlations (or high mutual information) between the responses to those features and words associated with the images. So presumably we would automatically discover that a blue patch is a good feature for sky and an orange stripy patch is a good feature for tigers. Things become a little more complex because there are many slightly different blue patches. Once we choose one, the patch that gives most additional information about sky is likely to be quite different. (Think orthogonal basis, and note relation to clustering mentioned below).

Since we are looking for correlations based on images, and the feature detector responds to a part of an image, we would want to experiment with combining the responses. For example, likely the first thing to try would be simply taking the maximum, but there are other possibilities.

Doing a blind search is, of course, computationally intense. There is no hope of trying them all. The first task would be to implement fast detection (one method is done or nearly done, a second method that needs to be tried is not overly difficult, and it would be useful in general to know how they compared in terms of speed). Also, it is likely best to first find a clustering of patches first. Once things are in place, we would then keep a cluster of CPU's warm finding some interesting features.

Of course, doing better then blind search is of interest, and likely with a bit of thought, we could do something more clever. At which point we would have to change the project title.

Illumination models for graphics

This is essentially a graphics project with some vision added. It should only be attempted by someone quite experienced in graphics in order that it does not become a true graphics project. The point of the project would be to exploit your skill at graphics to do something interesting with illumination models based on measured data (which exists). In particular, I would like to see fluorescent surfaces handled in a specific way (come see me for details). Working the radiosity equations in this paradigm would be very interesting. There has been very limited work on fluorescent surfaces in graphics, and a effectively including them into a renderer would be a great project.

If sufficient progress was made by late March, a paper could be submitted to the color imaging conference which has a deadline of April 6.

Measuring color constancy at a pixel

Implement this paper and look at the variation of the solution over the pixels of an image. Average the results over the image as a way to do color constancy and integrate this method into this existing color constancy algorithm tester. Of interest is how this and other algorithms behave as the size of the image patch given to them changes.

Classifying normal verses non-normal neurons

(Under consideration by Kael)

This directory contains two subdirectories with images from neurons and their respective skeletonizations. (It is not clear whether there is any mileage in improving the skeletonizations). The neurons form two groups (either a "c" in the filename or a "d"). The difference is related to the amount of curvature at the endings. (We will find out more about this). The object is to write a classification program. On the assumption that the curvature of the endings is an issue, a start would be to identify the endings, color them red (to verify), compute the average curvature, and see if this can be used to classify the two groups. and look at the variation of the solution over the pixels of an image. Average the results over the image as a way to do color constancy and integrate this method into an existing color constancy algorithm tester.

Using neural networks for word prediction (challenging!)

This project requires previous experience with neural networks or a keen desire to put extra time into learning about them---the instructor is neither a neural network expert nor an advocate for them, and has no idea if it can be made to work. A sufficiently detailed and intellegent report of failure would be acceptable.

Consider data of the form:

Document 1:
   continous feature vector 1
   continous feature vector 2
   ....
   continous feature vector n 
                             1

   discrete binary vector (a sequence of yes and no's)

Document 2:
   continous feature vector 1
   continous feature vector 2
   ....
   continous feature vector n  
                             2

   discrete binary vector (a sequence of yes and no's)

..............

Document M:
   continous feature vector 1
   continous feature vector 2
   ....
   continous feature vector n  
                             M

   discrete binary vector (a sequence of yes and no's)

The number of feature vectors is variable, but they could be forced to be the same in the intial project phases to simplify things. The project is to construct a neural network to predict the binary vector data from (a) the continous feature vectors, and (b), a single continous feature vector.

To be more concrete, and explain why this relates to vision, we can consider the continuous feature vectors as representing image regions, and the binary vectors representing words. So this neural network would be able to predict the word tiger from an image with an orange stripey bit.

Note that this is not an easy project. A foray into the literature is a must. It is quite likely that a neural network like this has never been constructed. Normally, a neural network is given a single feature vector, and a simple output. It is possible to put the above data into that form, by simply conjecturing that each a feature vector should be associated with each word (i.e. hedging your bets or distrubuting the uncertaintly), and hope that the neural network learns the right thing. This could be a worthwhile warm-up exersize whose results can form a naive baseline, but it does not really get to the heart of the problem. If you were then to weight this data based on the neural network output, and iterate this process, then you would the algorithm in this paper , except, instead of a SVM you have a neural network. Since this has already be done, the focuss on this project is to think of the kinds of representations that will allow a neural network to learn the essence of the above data in a novel way. In particular, the network likely should represent the fact that the words are predicted through some group of regions. If that group of regions is taken to be the entire set (the whole image), then the network would overtrain. As is always the case, it is critical to work towards good performance on data not used for training.

To do this project, you will have to convince me that you have a solid understanding why this is a difficult project, and is not a standard neurel network project.